The Theory 
of 

Probability 


by B. V. GNEDENKO 


Translated from the Russian 
by GEORGE YANKOVSKY 


MIR PUBLISHERS • MOSCOW 



First published 1969 
Second printing 1973 
Third printing 1976 
Fourth printing 1978 


Ha amAuUcKOM nmKe 


© English translation, Mir Publishers, 1978 



Contents 


Introduction.. .... 7 

Chapter 1. THE CONCEPT OF PROBABILITY .. 13 

Sec. 1. Certain, Impossible, and Random Events ........... 13 

Sec, 2. Different Approaches to the Definition of Probability .... 16 

Sec. 3. The Sample Space .. 19 

Sec. 4. The Classical Definition of I^obability. 23 

Sec. 5. The Cla^ical Definition of Probability. Examples. 26 

Sec. 6. Geometrical Probability. 33 

Sec. 7. Frequency and Probability .. 39 

Sec. 8. An Axiomatic Construction of the Theory of Probability ... 45 

Seo. 9. Conditional Probability and the Most Elementary Basic For¬ 
mulas . 51 

Sec. 10. Examples... 59 

Exercises .... 67 

Chapter 2. SEQUENCES OF INDEPENDENT TRIALS. 70 

Sec. 11. Independent Trials. Bernoulli’s Formulas. 70 

Sec. 12. The Local Limit Theorem. 76 

Sec. 13. The Integral Limit Theorem. 85 

Sec. 14. Applications of the Integral Theorem of DeMoivre-Laplace 92 

Sec. 15. Poisson’s Theorem .. 97 

Sec. 16. An Illustration of the Scheme of Independent Trials. 102 

Exercises . 104 

Chapter 3. MARKOV CHAINS .. 107 

Sec. 17. Markov Chains Defined. Transition Matrix. 107 

Sec. 18. Classification of Possible States. Ill 

Sec. 19. Theorem on Limiting Probabilities. 113 

Sec. 20. Generalizing the DeMoivre-Laplace Theorem to a Sequence of 

Chain-Dependent Trials.. 116 

Exercises ..... 123 

Chapter 4. RANDOM VARIABLES AND DISTRIBUTION FUNCTIONS 124 

Sec. 21. Basic Properties of Distribution Functions .. 124 

Sec. 22, Continuous and Discrete Distributions. 130 

Sec. 23. Multidimensional Distribution Functions. 134 

Sec. 24. Functions of Random Variables , ... 142 

Sec. 25. The Stieltjes Integral .. 155 

Exercises .. 160 

Chapter 5. NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES 164 

Sec. 26. Mathematical Expectation.. . 164 

Sec. 27. Variance. 169 

Sec. 28. Theorems on Expectation and Variance .. 176 

Sec. 29. Mathematical Expectation Defined in the Axiomatics of Kol¬ 
mogorov . 182 

Sec. 30. Moments. 185 

Exercises . 191 

Chapter 6. THE LAW OF LARGE NUMBERS .. 195 

Sec. 31. Mass-Scale .Phenomena and the Law of Large Numbers . . , 195 








































6 


Contents 


Sec. 32. Chebyshev’s Form of the Law of Large Numbers. 198 

Sec. 33. A Necessary and Sufficient Condition for the Law of Large 

Numbers. 206 

Sec. 34. The Strong Law of Large Numbers. 209 

Exercises . 218 

Chapter 7. CHARACTERISTIC FUNCTIONS. 219 

Sec. 35. Definition and Elementary Properties of Characteristic Functions 219 

Sec, 36. The Inversion Formula and the Uniqueness Theorem. 224 

Sec. 37. Helly’s Theorems. 230 

Sec. 38. Limit Theorems for Characteristic Functions. 235 

Sec. 39. Positive Definite Functions. 239 

Sec, 40. Characteristic Functions of Multidimensional Random Variables 243 

Exercises . 248 

Chapter 8. THE CLASSICAL LIMIT THEOREM. 251 

Sec. 41. Statement of the Problem. 251 

Sec. 42. Lyapunov’s Theorem. 254 

Sec. 43. The Local Limit Theorem. 259 

Exercises . 266 

Chapter 9. THE THEORY OF INFINITELY DIVISIBLE DISTRIBUTION 

LAWS. 267 

Sec. 44. Infinitely Divisible Laws and Their Basic Properties .... 268 

Sec. 45. The Canonical Representation of Infinitely Divisible Laws . . . 270 

Sec. 46. A Limit Theorem for Infinitely Divisible Laws. 275 

Sec. 47. Statement of the Problem of Limit Theorems for Sums . . . 278 

Sec. 48. Limit Theorems for Sums. 279 

Sec. 49. Conditions for Convergence to the Normal and Poisson Laws 282 

Exercises . 285 

Chapter 10. THE THEORY OF STOCHASTIC PROCESSES. 287 

Sec. 50. Introductory Remarks. 287 

Sec. 51. The Poisson Process. 291 

Sec. 52. Conditional Distribution Functions and Bayes’ Formula . . . 298 

Sec. 53. Generalized Markov Equation. 302 

Sec. 54. Continuous Stochastic Processes. Kolmogorov’s Equations .... 303 

Sec. 55, Purely Discontinuous Stochastic Processes. The Kolmogorov- 

Feller Equations. 311 

Sec. 56. Homogeneous Stochastic Processes with Independent Increments 318 
Sec, 57. The Concept of a Stationary Stochastic Process. Khinchin’s 

Theorem on the Correlation Coefficient . . 323 

Sec. 58. The Concept of a Stochastic Integral. The Spectral Decompo¬ 
sition of Stationary Processes. 331 

Sec. 59, The Birkhoff-Khinchin Ergodic Theorem .. , 334 

Chapter 11. ELEMENTS OF QUEUEING THEORY.. . 339 

Sec. 60. A General Description of the Problems of the Theory. 339 

Sec. 61. Birth and Death Processes .. 346 

Sec. 62. Single-Server Queueing System. 355 

Sec. 63. A Limit Theorem for Flows.. 361 

Sec. 64. Elements of the Theory of Stand-by Systems. 367 

APPENDIX. 376 

BIBLIOGRAPHY. 382 

SUBJECT INDEX. 388 






































Introduction 


This book aims to give an exposition of the fundamentals of the 
theory of probability, a mathematical science that treats of the re¬ 
gularities of random phenomena. 

The theory of probability originated in the middle of the seven¬ 
teenth century and is associated with the names of Huygens, Pascal, 
Fermat, and James Bernoulli. The correspondence between Pascal 
and Fermat dealing with problems in games of chance that did not 
fit into the framework of the mathematics of those days laid the 
foundations for such important concepts as probability and math¬ 
ematical expectation. We must be clear on one point: the famous 
scientists that dipped into these gambling problems also foresaw 
the fundamental role of the science that studies random events. They 
were convinced that clear-cut regularities could arise on the basis 
of large numbers of random events. Because of the low level of deve¬ 
lopment of natural science in that period, however, games of chance 
and also problems of insurance and demography were for a long 
time the sole concrete material used in building up concepts and 
methods of the theory of probability. This circumstance placed its im¬ 
print on the formal mathematical apparatus applied in the solution 
of probability problems: it consisted exclusively in the use of elemen¬ 
tary arithmetic and combinatorial methods. The subsequent develop¬ 
ment of probability theory and also the broad application of its re¬ 
sults and methods of investigation to natural science, in particular to 
physics, demonstrate that the classical concepts and methods still 
hold today. 

The rigorous demands of the natural sciences (the theory of errors 
of observation, problems in the theory of gunfire and of statistics, 
primarily population statistics) called for a further development of 
probability theory and the use of a more sophisticated analytical 
apparatus. A particularly important role in the development of analy- 



8 


Introduction 


tical methods of probability theory was played by DeMoivre, Laplace, 
Gauss, and Poisson. Associated with this trend, in a formal analytical 
manner, is the work of the founder of non-Euclidean geometry, 
N. I. Lobachevsky, devoted to the theory of errors in measurements 
performed on a sphere and carried out for the purpose of establishing 
the dominant geometrical system of the universe. 

From the mid-19th century to the twenties of the present century, 
the development of probability theory was largely connected with 
Russian scientists: P. L. Chebyshev, A. A. Markov, and A. M. Lya¬ 
punov. This success of Russian science was prepared by the work of 
V. Ya. Bunyakovsky who cultivated applications of probability 
theory to statistics, in particular in the field of insurance and de¬ 
mography. He wrote the first Russian course of probability theory 
which exerted a profound effect in the field and excited much interest. 
The abiding principal value of the work of Chebyshev, Markov and 
Lyapunov in probability theory consists in the fact that they intro¬ 
duced and widely applied the concept of a random variable. In this 
book we shall take up Chebyshev’s studies in the law of large numbers, 
Markov chains and Lyapunov’s limit theorem. 

Today, probability theory is extending its influence and practical 
application to many spheres, and researches throughout the world 
have enriched the theory with important results. In this great up¬ 
surge, the Soviet school of probability theory continues to occupy 
a prominent position. Foremost among the Soviet workers are S.N. Ber¬ 
nstein, A. N. Kolmogorov, and A. Ya. Khinchin. The ideas of and 
results obtained by present-day scientists that have revolutionized 
the theory of probability will be introduced as the subject matter 
demands. In the very first chapter we will speak of the fundamental 
studies of Bernstein and Kolmogorov dealing with the foundations 
of probability theory. In the first decade of this century, E. Borel 
pointed to ideas connecting the theory of probability with the metric 
theory of the functions of a real variable. Somewhat later—in the 
1920s—A. Ya. Khinchin, A. N. Kolmogorov, E. E. Slutsky, P. Levy, 
A. Lomnitsky and others considerably elaborated these ideas, 
which proved extremely fruitful for the development of science. 
It will be noted, for one thing, that it was precisely in this direction 
that a definitive solution of the classical problems posed by Chebyshev 
was found. The principal advances of this trend are due to J. W. Lin- 
deberg, S. N. Bernstein, A. N. Kolmogorov, A. Ya. Khinchin, Wil¬ 
liam Feller, Paul Levy, and others. The ideas of the metric theory of 
functions and, subsequently, of functional analysis made it possible 
to extend substantially the content of probability theory. The 1930s 
saw the foundations laid for the theory of stochastic (probabilistic, 
random) processes, which today has become the principal trend of re¬ 
search in probability theory. This theory is a fine example of the or¬ 
ganic synthesis of mathematical and natural-science thinking, 



Introduction 


9 


when the mathematician has grasped the physical essence of a crucial 
scientific problem and finds adequate mathematical language to 
fit it. 

The idea of such a theory was apparently expressed by J. H. Poin¬ 
care, and the first outlines of it may be found in the work of L. Bache- 
lier, Fokker, and Planck. However, construction of the mathematically 
complete foundations of the theory of stochastic processes is associated 
with the names of Kolmogorov and Khinchin. We must note that so¬ 
lutions to classical problems of probability theory were found to be 
closely tied up with the theory of stochastic processes. In Chapter 
10 we shall give elements of this new section of probability theory. 
Finally, we may mention such fresh fields of application as reliabi¬ 
lity theory and queueing theory. Sections 60 to 64 of this book deal 
briefly with the content of this new division of science. 

During recent decades the role of probability theory in modern 
natural science has grown immeasurably. With the advent of mole¬ 
cular concepts in the structure of matter, probability theory un¬ 
avoidably came to the fore in both physics and chemistry. From the 
point of view of molecular physics, every substance consists of an 
enormous number of small particles in constant motion and in con¬ 
stant interaction. Little is known about the nature of these particles, 
their interaction, mode of motion and so forth. In general outline, 
the information about these particles ends with the fact that their 
numbers are large and that in a homogeneous body their properties 
are similar. Quite natural, then, that under such conditions the math¬ 
ematical methods commonly applied to physical theories were help¬ 
less. For example, the apparatus of differential equations could not 
yield serious results in a situation like that. Indeed, neither the stru¬ 
cture nor the laws of interaction between the particles of the substance 
had been sufficiently studied. Application of differential equations 
in such a situation could only be extremely arbitrary. But even if 
this difficulty were eliminated, the enormous number of particles 
represents such a formidable barrier to any study of their motions as 
to be far beyond the range of the customary equations of mecha¬ 
nics. 

What is more, such an approach is methodologically unsatisfactory. 
Indeed, the problem here is not to study the individual particle mo¬ 
tions but to investigate the regularities that arise in assemblies of 
large numbers of moving and interacting particles. Now the laws 
that arise from large numbers of participating elements have their 
own peculiarities and do not reduce to a simple summation of indivi¬ 
dual motions. Moreover, such regularities are found, within certain 
limits, to be independent of the individual peculiarities of the parti¬ 
cipating particles. Quite naturally, fresh mathematical methods of 
investigation must be found to study these new regularities. What 
demands are to be made? First, undoubtedly they must take into ac- 



10 


Introduction 


count that the phenomenon at hand has to do with large numbers; 
thus, for these methods, the existence of large numbers of interacting 
particles should not represent an additional difficulty but rather a 
simplification for the study of such laws. Further, insufficient know¬ 
ledge about the nature and structure of the particles and likewise 
about the nature of their interactions should not limit the efficacy 
of their application. These demands are best satisfied by the methods 
of probability theory. 

To avoid any misunderstanding, we again stress the following 
circumstance. When we say that the apparatus of probability theory 
is best suited to the study of molecular phenomena, we do not in 
the least wish to say that the philosophical premises for applying 
the theory of probability in natural science spring from the insuf¬ 
ficiency of our knowledge. The basic principle lies in the fact that 
when studying mass-scale phenomena, a set of new and peculiar re¬ 
gularities come to light. When studying phenomena caused by the 
action of large numbers of molecules, it is not necessary to take into 
consideration all the properties of every molecule. Indeed, the study 
of nature requires that inessential details be ignored. If all details, 
all existing relationships, including those that are not essential for 
the given phenomenon, are considered, then the phenomenon itself 
is obscured and it becomes more difficult to grasp the subject because 
of such artificial complications. 

We can judge how aptly a phenomenon has been outlined and how 
successful is the choice of mathematical tools for its study by the 
agreement between theory and experiment (practice). The develop¬ 
ment of natural sciences, physics in particular, shows that the appa¬ 
ratus of probability theory has proved exceptionally suited to the 
study of numerous phenomena of nature. 

The above-mentioned relationship between probability theory and 
the requirements of modern physics best explains the reasons why 
probability theory during the past few decades has become one of 
the most rapidly developing branches of mathematics. Fresh theoreti¬ 
cal results open up new opportunities for applying methods of probabi¬ 
lity theory to the natural sciences. Diversified studies of natural 
phenomena force probability theory to seek new laws generated by 
the factor of chance. Probability theory does not dissociate itself 
from the demands of other sciences, but keeps step with the general 
advance of the natural sciences. This does not, of course, mean that 
probability theory is only an auxiliary tool in the solution of certain 
practical problems. Quite the contrary, it must be stressed that during 
the past three decades, the theory of probability has developed into 
a harmonious mathematical discipline with its own problems and 
methods of proof. At the same time it has come to light that the most 
essential problems of probability theory serve in the solution of di¬ 
verse problems of natural science and practical affairs. 



Introduction 


11 


From the very start we defined the theory of probability as a sci¬ 
ence concerned with random events. The notion of a random event 
will be explained in the first chapter. For the time being we confine 
ourselves to a few remarks. In ordinary parlance a random event is 
regarded as something extremely rare that runs counter to the estab¬ 
lished order of things and the law-governed development of events; 
in the theory of probability, we reject this point of view. In probabi¬ 
lity theory, random events possess a series of characteristic peculi¬ 
arities; for one thing, they all occur in mass-scale phenomena. By 
mass-scale phenomena we have in view such that occur in assemblages 
of large numbers of entities of equal or nearly equal status and are 
determined by this mass-scale nature of the phenomenon, depending 
only in slight measure on the nature of the component entities. 

Like the other divisions of mathematics, the theory of probability 
developed out of the demands of practical affairs; in abstract form it 
reflects the regularities peculiar to random events of a mass-scale 
character. Such regularities play an exceedingly important role in 
physics and in other natural sciences, in military affairs, diversified 
fields of technology, in economics, and elsewhere. In recent times, 
in connection with large-scale production, the results of probability 
theory are not only used in locating defective items already produced 
but, what is more important, for organizing the very process of pro¬ 
duction (statistical quality control in production). 

As has already been noted, the relationship between probability 
theory and practical requirements has been the basic reason for the 
rapid development of probability theory in the past three decades. 
Many divisions of the theory evolved in response to the problems 
of practical workers. It is fitting here to recall the remarkable words 
of the founder of the Russian school of probability theory, P. L. Che- 
byshev: “The link-up between theory and practice yields the most 
salutary results, and the practical side is not the only one that be¬ 
nefits; the sciences themselves advance under its influence, for it 
opens up to them new objects of investigation or fresh aspects of fa¬ 
miliar objects... . If the theory gains much from new applications of 
an old method or from its new developments, then it benefits still 
more from the discovery of new methods, and in this case too, science 
finds itself a true guide in practical affairs.” 




The Concept of Probability 


Sec. 1. Certaitij Impossible^ and Random Events 

On the basis of observations and experiment, science arrives at 
laws that govern the course of the phenomena under study. The most 
elementary and widespread scheme of the regularities under study is: 

1. In every realization of a set (complex) of conditions there occurs 
an event A. 

Thus, to illustrate, if water at atmospheric pressure (760 mm) 
is heated above 100° C (the set of conditions 0), it is transformed 
into steam (event ^4). Or, for any chemical reaction of substances, 
without exchange with the surrounding medium (the set of condi¬ 
tions 0), the total quantity of substance (matter) remains unchanged 
(event A). This assertion is called the law of conservation of matter. 
The reader will readily be able to point out other laws taken from 
physics, chemistry, biology and other sciences. 

An event that unavoidably occurs for eva*y realization of the set 
of conditions 0 is called certain (sure). If event A definitely cannot 
occur upon realization of the set of conditions 0 it is called impos¬ 
sible. And if, when the set of conditions 0 is realized, event A may 
or may not occur, it is called random. 

From these definitions it is clear that when speaking of certainty, 
impossibility or randomness of some event, we will always have in 
view the certainty, impossibility and randomness with respect to 
some definite set of conditions 0. 

Proposition 1 asserts the certainty of event A upon realization of 
a set of conditions 0. Asserting the impossibility of some event upon 
the realization of a given set of conditions does not yield anything 
essentially new because it readily reduces to assertions of type 1: 
the impossibility of event A is tantamount to the certainty of the 

opposite event A, which consists in the fact that A does not occur. 

The mere assertion of the randomness of an event is of very restricted 
cognitive interest: it simply amounts to stating that the set of condi- 



14 


Chap. 1. The Concept of Probability 


tions 0 does not reflect the entire collection of reasons necessary 
and sufficient for the occurrence of A. Such an indication cannot be 
considered totally devoid of content, for it may serve as a stimulus to 
further study of the conditions of occurrence of A, but of itself it does 
not yet yield us any positive knowledge. 

However, there is a broad range of phenomena in which, given a 
repeated realization of the set of conditions 0, the portion of that 
range of cases when event A occurs only occasionally deviates to any 
substantial degree from some average value, which can thus serve 
as a characteristic indicator of a mass-scale operation (multiply re¬ 
peated set 0 relative to event A). 

For such phenomena one can give not only a simple statement of 
the randomness of event A, but also a quantitative estimation of 
the possibility of its occurrence. This estimation is expressed by a 
proposition of the type: 

2. The probability that event A will occur upon realization of a set 
of conditions 0 is equal to p. 

Regularities of this kind are termed probabilistic or stochastic. 
Probabilistic regularities play an important role in diverse fields 
of science. For instance, there is no way of predicting whether a gi¬ 
ven radium atom will decay in a given interval of time or not, but 
it is possible, on the basis of experimental findings, to determine the 
probability of such decay: an atom of radium decays in a time inter¬ 
val of t years with a probability 

p_l_g-0.000436^ 

Here the set of conditions 0 consists in the fact that we consider a 
radium atom which for a given number of years is not subjected to 
any unusual external actions (like bombardment with high-speed 
particles); otherwise its conditions of existence are inessential: it is 
of no consequence what the medium is, what temperature it has, 
and so on. Event A consists in the fact that the atom will decay in 
the given time of t years. 

The idea which now seems to us quite natural, that the probability 
of a random event A, under known conditions, admits of a quantita¬ 
tive evaluation by means of some number 

p=V(A) 

was elaborated in systematic fashion for the first time in the 17th 
century in the works of Fermat (1601-1665), Pascal (1623-1662), 
Huygens (1629-1695), and in particular James Bernoulli (1654-1705). 
Their investigations laid the foundations of the theory of probability. 
Since that time, the theory of probability has been under development 
as a mathematical discipline and has become enriched with new im¬ 
portant results. Its applicability to the study of actual phenomena of 



Sec. 1. Certain, Impossible, and Random Events 


15 


the most diversified nature continues to find new and brilliant con¬ 
firmation. 

There can be no doubt that the concept of mathematical probabi¬ 
lity warrants a profound philosophical study. The basic specific 
philosophical problem advanced by the very existence of probabil¬ 
ity theory and its successful application to real phenomena consists 
in the following: wnder what conditions is there objective meaning in 
the quantitative estimate of the probability of a random event A with 
the aid of a definite number P(>t), called the mathematical probability of 
event A, and what is the objective meaning of this estimate. A clear under¬ 
standing of the relationships between the philosophical categories 
of randomness and necessity is an inevitable prerequisite for a success¬ 
ful analysis of the concept of mathematical probability; but this 
analysis cannot be complete without answering the question we have 
posed: under what conditions does chance allow for a quantitative 
estimate, in the form of a number, of probability. 

Every investigator dealing with the application of probability 
theory to physics, biology, engineering, economic statistics, or any 
other concrete science, actually proceeds from the conviction that 
probabilistic judgements express certain objective properties of the phe¬ 
nomena under study. To assert that under a certain set of conditions 
0 the occurrence of an event A has a probability p is to assert that 
between the set of conditions 0 and the event A there is a certain 
perfectly definite, though peculiar (but no less objective for this rea¬ 
son), relationship existing independently of the investigator. The phi¬ 
losophical problem is to elucidate the nature of this relationship. 
It is only the difficulty of this problem that has made possible the 
paradoxical circumstance that even among scientists who in general 
philosophical problems do not take the idealistic stand, one can find 
attempts to dismiss the problem (instead of solving it in a positive 
fashion) by asserting that probabilistic judgements have to do only 
with the state of the investigator (such judgements being regarded 
as measuring the degree of his confidence that event A will occur, 
and so forth). 

The extensive and diversified experience of applying probability 
theory in a wide range of fields teaches us that the very problem of a 
quantitative estimation of the probability of some event has rea¬ 
sonable objective meaning only under certain quite definite condi¬ 
tions. 

The definition given above of the randomness of an event A for 
a set of conditions 0 is purely negative: the event is random if it 
is not necessary and not impossible. From the fortuitous nature of 
an event in this purely negative sense it does not in the least follow 
that it is meaningful to speak of its probability as a certain definite 
number, even if it is one we do not know. In other words, not only 
the assertion that “event A has a definite probability P(A) for a set 



16 


Chap. 1. The Concept of Probability 


of conditions but also the simple assertion that this probability 
exists is an informative assertion which in each specific case requires 
substantiation or, when it is taken as a hypothesis, subsequent ve¬ 
rification. 

For example, a physicist encountering a new radioactive element 
will from the start assume that for an atom of the element that is 
left to itself (i.e., not subject to external influences of extraordinarily 
great intensity) there exists a certain probability of decay during 
time ty the dependence of which on time is of the form 

p— \ — 

and will strive to determine the coefficient a that characterizes the rate 
of decay of the new radioactive element. The question may be posed 
of the dependence of the probability of decay on external conditions, 
for example on the intensity of cosmic radiation: here the researcher 
will proceed from the assumption that to every sufficiently definite 
set of external conditions there corresponds a certain definite value of 
the coefficient a. 

The situation is exactly the same in all other cases of successful 
practical application of the theory of probability. For this reason, 
the problem of the philosophical elucidation of the real content of 
the concept “mathematical probability” may be made hopeless from 
the very start if a definition is required that may be applied to any 
event >4 under any set of conditions 

Sec. 2. Different Approaches to the Definition 
of Probability 

A very large number of different definitions of mathematical pro¬ 
bability have been proposed by various authors. We shall not attempt 
here to examine all the logical niceties of these many definitions. 
Any scientific definition of such basic concepts as probability is 
only a subtle logical analysis of a certain store of very simple obser¬ 
vations and practical procedures that have justified themselves by 
long and successful employment. Interest in a logically irreproachable 
“substantiation” of probability theory arose later, historically spea¬ 
king, than the ability to determine the probability of various events, 
to perform calculations with these probabilities and also to utilize 
the results of the calculations in practical affairs and in scientific 
research. For this reason, in most attempts at a scientific definition 
of the general'concept of probability it is easy to perceive various as¬ 
pects of the concrete cognitive process which in each specific case leads 
to an actual determination of the probability of a given event, whether 
this is the probability of getting a six in four throws of a die or the 
probability of radioactive decay or of hitting a target. Some defini¬ 
tions start from inessential, subsidiary aspects of real processes; these 



'Sec. 2. Different Approaches to the Definition of Probability 


17 


are totally fruitless. Others advance some one aspect of the matter or 
certain modes of actually finding the probability that are not appli¬ 
cable in all cases. These definitions merit closer examination despite 
their one-sided nature. 

From the viewpoint thus delineated, most definitions of mathe¬ 
matical probability may be divided into three groups: 

1. Definitions of mathematical probability as a quantitative 
measure of the "degree of certainty” of the observer. 

2. Definitions that reduce the concept of probability to that of 
"equal possibility” as being the most primitive concept (the so-called 
''classical definition of probability). 

3. Definitions that proceed from the “frequency” of occurrence of 
an event in a large number of trials (the ''statistical definition). 

Sections 4 and 7 are devoted to the second and third groups. The 
definitions of the first group will be critically examined at the end 
of this section. If mathematical probability is a quantitative measure 
of the degree of certainty of the investigator, then the theory of pro¬ 
bability will reduce to something in the nature of a division of psycho¬ 
logy. Ultimately, a consistent development of such a purely subjecti¬ 
vistic conception of probability would unavoidably lead to subjective 
idealism. Indeed, if it is assumed that the evaluation of probability 
is related only to the state of the investigator, then all conclusions 
from probabilistic judgements (judgements of type 2) are stripped of 
the objective content that is independent of the investigator. Yet 
type 2 probabilistic judgements are used as a basis for many positive 
conclusions which in no way differ in significance from conclusions 
obtained without appeal to probability. For example, physics derives 
all the "macroscopic” properties of gases from suppositions concerning 
the nature of the probabilities of behaviour of the individual molecules. 
If we attribute objective value (that is, value independent of the in¬ 
vestigator), then in the initial probabilistic hypotheses concerning 
the course of “macroscopic” molecular processes it is necessary to 
perceive something more important than mere registration of our 
psychological states that arise when meditating about molecular mo¬ 
tions. 

For those who take the stand of reality existing independently of 
us and of the fundamental knowability of the external world, and who 
also reckon with the fact that probabilistic judgements are success¬ 
fully employed in learning about the external world, it must be ab¬ 
solutely clear that the purely subjective definition of mathematical 
probability is quite untenable. This might suffice to complete the 
discussion of definitions of the first group if they did not find support 
in the original everyday meaning of the word "probability”. The 
point is that in ordinary usage the expressions "probably”, "very 
probably”, "it is highly improbable”, and so forth do indeed express 
simply the attitude of the speaker to the truth or falsity of some 



18 


Chap. 1. The Concept of Probability 


single judgement. We must therefore put stress on a circumstance that 
we have not dwelt on as yet. When in Section 1 we immediately cen¬ 
tred attention on type 2 probabilistic regularities by opposing them 
to the strict causal regularities of type 1, we acted in full accord with 
practical applications of the concept of mathematical probability, 
but from the very start we somewhat digressed from the ordinary 
“prescientific” meaning of the word “probability”: whereas in all real 
scientific applications of probability theory, “probability” is the 
probability of the occurrence of some event A, provided that a certain 
set of conditions 0, which is fundamentally reproducible an infinite 
number of times, has been realized (it is only in such a setting that 
the statement 


p=P(/l) 

expresses a certain regularity with objective meaning), in ordinary 
parlance the customary thing is to speak of a greater or lesser proba¬ 
bility of some quite definite judgement. For example, relative to the 
judgements: 

(a) every even natural number greater than two may be represented 
in the form of a sum of two prime numbers (4=2-f-2, 6=3-f3, 
8=5+3, etc.); 

(b) snow will fall in Moscow on May 7, 1976; the following may 
be said: concerning judgement (a) we do not yet have full knowledge, 
but many believe it to be extremely likely; one must expect to get 
the exact answer to judgement (b) only on 7 May 1976. However, 
since snow very rarely falls in Moscow in May, judgement (b) should 
for the present be considered highly unlikely. 

Indeed, we only attribute a subjective meaning to such statements 
relative to the probability of isolated facts or, in general, specific 
judgements (even though of a general nature): they reflect only the 
attitude of the speaker to the given question. And it is true that when 
speaking about the greater or lesser probability of a specific judgement, 
we ordinarily do not in the least desire to doubt that the law of the 
excluded middle is applicable. For instance, no one doubts that each 
of the propositions (a) and (b) is actually true or false. Even if such 
doubts were expressed by the so-called intuitionists concerning jud¬ 
gements (a), at any rate, to the ordinary mind, the possibility of 
speaking of a greater or lesser probability of this proposition is in no 
way related to doubts as to whether the law of the excluded middle 
is applicable or not. If proposition (a) is ever proved or refuted, all 
preliminary estimates of its probability made at the present time 
become meaningless. In exactly the same way, when 7 May 1976 
comes, it will be easy to see whether (b) is true or not; if snow falls on 
that day, there will be no sense in giving the view that this event is 
improbable. 



Sec. 3. The Sample Space 


19 


A complete investigation of the extreme diversity of psychic states 
of doubt intermediate between a categorical admission and cate¬ 
gorical rejection of an isolated judgement, no matter how interesting 
that may be for psychology, would only lead us far astray from our 
basic problem of elucidating the meaning of probabilistic regularities 
of objective scientific value. 

Sec. 3. The Sample Space 

In the preceding section we saw that the definition of mathematical 
probability as a quantitative measure of the “degree of certainty” of the 
investigator does not capture the content of the notion of probability. 
We therefore return to the question of where objective probabilistic 
regularities come from. The classical and statistical definitions of 
probability claim to yield simple and direct answers to this question. 
We shall see later on that both these definitions reflect essential as¬ 
pects of the actual content of the notion of probability, though each 
one taken separately is insufficient. A full understanding of the nature 
of probability demands their synthesis. In the next few sections we 
shall deal exclusively with the classical definition of probability 
that proceeds from the idea of equal likelihood as an objective property 
of diverse possible versions of the course of phenomena based on their 
actual symmetry. Henceforward we shall have to do only with such 
a conception of equal likelihood. The definition of probability in 
terms of “equal likelihood” as understood in a purely subjective sense 
of the identical “likelihood” for the investigator is a variant of the 
definitions of probability expressed in terms of “degree of certainty” 
of the observer which we have already dismissed from our consideration. 

Before passing to the classical definition of the notion of probability 
we shall m^e some preliminary remarks. We will consider as fixed 
a set of conditions 0 and will examine a certain family 5 of events 
Ay By Cy .., each of which must occur or not occur upon realization 
of the set 0. Certain relationships may obtain between the events 
of the family 5. Since they will be constantly under study later on, 
let us look into them at the outset. 

(1) If for every realization of a set of conditions © under which 
an event A occurs, there also occurs an event By then we say that A 
implies B and we denote this by the symbol 

A(z.B 

or 

BznA 

which means that “B is implied by A” 


* Events will always be designated by capital letters A, B, C, D, etc. 



20 


Chap. /. The Concept of Probability 


(2) If A implies B and at the same time B implies A, that is, if 
in each realization of the set of conditions 0 events A and B both 
occur or both fail to occur, we shall say that events A and B are 
equivalent; this will be denoted by the symbol A—B. 

We note that in all considerations of probability theory, equivalent 
events can replace one another. For this reason, we will agree in the 
future to consider any two equivalent events as simply identical. 



(3) An event which consists in the occurrence of both events A 
and B will be called the product of A and B and will be designated 
ABiorAoB). 

(4) An event consisting in the occurrence of at least one of the 
events A and B will be cal ed the sum of A and B and will be desig¬ 
nated A+B {or Au5). 

(5) An event which consists in the fact that event A occurs but 
event B does not will be called the difference of the events A and B 
and will be designated A — B. 

We shall illustrate the newly introduced concepts with simple 
examples. The first is what is known as the Venn diagram. 

Let there be a set of conditions @ such that a point is chosen at 
random within a square, as depicted in Fig. 1. Denote by A the 
event that “the chosen point lies inside the circle on the left”, and 
by B the event that “the chosen point lies inside the circle on the 

right”. Then events A, A, B, J5, A+B, AB consist In the chosen 
point being located in the regions shaded in the appropriate squares 
of Figi 1. 



Sec. 3. The Semple Space 


21 


Let US take another illustration. Suppose that the set of conditions 
@ consists in the fact that a die* is tossed once. We denote by 4 a 
roll of six, by B a roll of three, by C a roll of some even number 
of points, by D the roll of a sum that is a multiple of three. Then 
events Ay B, C and D are connected by the following relations: 

AdCy AdDy BdD 
A {-B=Dy CD=A 

The definition of a sum and product of two events is generalized 
to any number of events: 

A+B+. . .+N 

This signifies an event that consists in the occurrence of at least 
one of the events i4, B, . . ., iV, and 

AB , . , N 

signifies an event consisting in the occurrence of all events A, B, ...yN. 

(6) An event is called certain if it of necessity must occur (upon 
realization of a set of conditions 0). For example, in throwing two 
dice the total sum of points must certainly be at least two. 

An event is called impossible if it definitely cannot occur (no matter 
what the realization of the set of conditions @). For instance, the 
throwing of two dice can never yield a sum of thirteen points. 

All certain events are obviously equivalent. It is therefore jus¬ 
tifiable to denote all certain events by a single letter. For this pur¬ 
pose we shall use the letter U. All impossible events are likewise 
equivalent. We denote an impossible event by V. 

(7) Two events A and A are termed contrary if for them the two 
following relationships hold simultaneously: 

A-\-A = Vy AA^V 

For example, if throwing a die C signifies a roll with an even sum, 
then _ 

v—c = c 


is an event consisting in a roll with an odd sum of points. 

(8) Two events A and B are called mutually exclusive if their joint 
occurrence is impossible, that is, if 


If 


AB=F 


A == Bi-f"B3+... -f-Bn 


* A die is a cub^ the faces of which are numbered 1, 2, 3, 4, 5 and 6, 



22 


Chap. /. The Concept of Probability 


and events Bi are mutually exclusive in pairs, i.e., 

BiBj=V for 

then we say that event A is decomposable into the particular events 
Bi, B 2 , . . Bn. To illustrate, when tossing a die, event C, which 
consists in a roll of an even sum of points, is decomposable into the 
special events E^y B 4 , Bg, which are, respectively, rolls of 2, 4 and 6 . 

The events Bu Bg, . . ., B^ form a complete group of events if at 
least one of them must definitely occur (for each realization of the 
set @); that is if 

B,+B,+ ...+Bn=U 


Of special significance to us in the sequel will be complete groups 
of pairwise mutually exclusive events. Such, for example in a single 
toss of one die, is the family of events 

Ely Eiy B3, B4, E&y Eq 

which consists in rolls of 1, 2, 3, 4, 5, and 6 points, respectively. 

(9) Every problem in probability theory involves a certain set of 
conditions © and a certain family S of events that occur or do not 
occur upon every realization of the set of conditions It is advisable 
to make the following assumptions relative to such a system: 

(a) if the familu S includes events A and By it also includes events 
ABy A+By A—B\ 

(b) the family S contains a certain and an impossible event. 

A family of events that satisfies these assumptions is called a field 
of events. 

In the examples that we have used for illustrations, it was always 
possible to isolate events that could not be decomposed into simpler 
ones: a certain face turning up in the throw of a die; landing on a de¬ 
finite point in the square when considering Venn’s diagram. Let us 
agree to call such indecomposable events simple (elementary) events. 

In constructing a mathematical theory of probability, our intui¬ 
tive conceptions demand greater formalization than heretofore. In 
the modern expositions of probability theory, the starting point is 
a set of simple events, or, as it is generally termed, a space of simple 
events (a sample space). The nature of the elements of this space is 
not specified beforehand inasmuch as it is important to have a suf¬ 
ficiently broad choice so as to embrace all possible cases. For instance, 
the elements of the space may be points of Euclidean space, the fun¬ 
ctions of one or several variables, and so forth. The sets of points 
of a sample space form random events. An event which consists of 
all the points of a sample space is called a certain (sure) event. Eve¬ 
rything that we have said about the relations between random events 
in this section also holds true for a formal construction of the theory. 
We shall return to this system of exposition somewhat later, in Se- 



Sec. 4. The Classical Definition of Probability 


23 


ction 8 . In the next section we will confine ourselves to sample spaces 
consisting of a finite number of elements. 

Here we shall confine ourselves to stating the following laws that 
hold for random events: 

Commutative law: A-\-B=B-{-Ay AB=BA. 

Associative law: A+{B-\-C)={A-hB)-{-C, A(BC)={AB)C. 

Distributive law: A (B-hC)^AB-\-ACy ^+(5C)=(^+5)(^+C). 

Idempotency law: A-\-A=A, AA=A. 

We leave the proof of these laws to the reader. For anyone acquain¬ 
ted with elementary set theory there will be no difficulty. 

Sec. 4. The Classical Definition of Probability 

The classical definition of probability reduces the concept of pro¬ 
bability to the notion of equal probability (equal likelihood) of events, 
which is considered basic and is not subject to a formal definition. 
To illustrate, when tossing a die that has the exact shape of a cube 
and is made of completely homogeneous material, equally probable 
events are rolls of any specific sum of points (1, 2, 3, 4, 5, 6 ) marked 
on the faces of the cube, since by virtue of symmetry no face has an 
objective advantage over any of the others. 

In the general case, we consider some group G consisting of n mu¬ 
tually exclusive equally probable events (we call them simple events): 


We now form a family 5 consisting of the impossible event F, all 
events of group 0 and all events A that may be decomposed into 
special cases belonging to the group G. 

For example, if G consists of three events Ei, E^ and £ 3 , then sys¬ 
tem S includes the events* F, £ 1 , £ 2 , £3, £i 4 -£ 2 , £2+£3, £i+£ 3 , 

£ = £i + £2+£3- 

It is readily established that the family S is a field of events. In¬ 
deed, it is obvious that the sum, difference and product of events 
of 5 are included in S\ the impossible event F belongs to £ by de¬ 
finition, and the certain event V belongs to S, since it is represented 
in the form 

£=£i+£2+...+£ra 

The classical definition of probability is given for events of family 
S and may be formulated as follows: 


* These eight events exhaust the family 5 provided we do not distinguish (as 
we agreed to do at the end of Sec. 2) equivalent events. It will readily be shown 
that in the general case of a group G of « events, the family S consists of 2" 
events. 



24 


Chap. 1. The Concept of Probability 


Jf event A is decomposable into m special cases belonging to a complete 
group of n mutually exclusive and equally probable events, the probability 
P(^) of event A is 


p(^)=^ 


For example, in a single toss of a die, the complete group of mutu¬ 
ally exclusive and equally probable events comprises the events 

Ex, E2, Es, Ei, E^, Eq 


which consist of rolls of 1, 2, 3, 4, 5, 6 points, respectively. Event 

C=E2~{-Ei-\-E6 


which consists of an even number of points is made up of three spe¬ 
cial cases that are components of the complete group of mutually 
exclusive and equally probable events. Therefore, the probability 
of event C is 


P(C) = 



2 


It is also obvious that by virtue of the accepted definition 

P(£,) = 4, 

P (£, + £,) = 1 = 1 

and so on. 

The theory of probability makes wide use of the following termi¬ 
nology which we will need later on. Suppose that in order to find out 
whether an event A (say rolls of multiples of three) occurs or does 
not occur it is necessary to make a trial (that is, realize a set of con¬ 
ditions 0) that would give us the answer (in our case, throwing a 
die). The complete group of mutually exclusive and equally probable 
events that can occur in such a trial is called the complete group of 
possible outcomes of the trial. The possible outcomes of a trial into 
which event A is decomposed are called the outcomes of the trial 
that are favourable to A, Employing this terminology, we can say 
that the probability P(4) of event A is equal to the number of possible 
trial outcomes favourable to A divided by the number of all possible 
outcomes. 

Such a definition quite naturally presumes that the separate pos¬ 
sible outcomes are equally probable. 

Now let us consider the throwing of two dice. If the dice are true, 
a roll of each of the 36 possible combinations of numbers on both 
dice may be considered equally probable. Say, the probability of 
rolling 12 is equal to 1/36. A roll of 11 is possible in two ways: 5 on 



Sec. 4. The Classical Definition of Probability 


25 


the first die, 6 on the second, or 6 on the first and 5 on the second. 
Therefore the probability of a roll of eleven is 2/36=1/18. The reader 
will find it easy to verify that the probability of any specific roll 
(any sum) is given by the following table: 


TABLE i 


Sum 

2 

3 

4 

5 

6 

0 

8 

9 

10 

11 

12 

Probability 

■ 

2 

36 

3 

36 

4 

36 

5 

36 

6 

36 

5 

4 

36 

3 

36 

2 

36 

1 

36 


In accord with this definition, to every event A that belongs 
to the thus constructed field of events 5 there is ascribed a very 
definite probability 

p(^)=T 

where m is the number of those events £/ of the initial group G 
that are special cases of event A. Thus, the probability P (A) may 
be regarded as a function of the event A defined on the field of 
events S. 

This function possesses the following properties: 

1. For every event A of field S 

P (^)>0 

2. For the certain event U 

P(f/)=1 

3. If event A is decomposable into special cases B and C and all 
three events A, B and C belong to the field S, then 

P(A) = P(B}A-PiC) 

This property is called the theorem of addition of probabilities. 
Property 1 is obvious since the fraction ~ cannot be negative. 

Property 2 is no less obvious since all the n possible outcomes of 
a trial are favourable to the certain event (7, and therefore 

P({/) = ^=1 

We shall prove Property 3. Let event B be favoured by m' 
events, and event C by m" events of group G. Since by assump¬ 
tion the events B and C are mutually exclusive, events f,-, which 
favour one of them, are different from events Ei that favour the 


























26 


Chap. 1. The Concept of Probability 


other. There is thus a total of m'+m" events £/favourable to the 
occurrence of one of the events B and C, i.e., favourable to the 
event — Hence, 

which is what we had to prove. 

We restrict ourselves here to a few more properties of probabi¬ 
lity. 

4. The probability of event A, which is the opposite of event A, is 

P(:4) = l—P(.4) 

Indeed, since + = we have, from the already proved 

Property 2, 

P(^ + I)=l 

and since events A and A are mutually exclusive, by Property 3 

P(>1 + I) = P(^) + P(A) 

The last two equations prove our proposition. 

5. The probability of the impossible event is zero. 

Indeed, events U and V are mutually exclusive, therefore 

P(t/)-fP(y) = P(t/) 

whence it follows that 

P(P) = 0 

6. If event A implies event B, then 

P(AXP(5) 

Indeed, event B may be represented as the sum of two events A 
and AB. From this, by virtue of Properties 3 and 1, we have 

?{B) = P{A+AB) = P{A)-\-P{AB)^P{A) 

7. The probability of any event lies between zero and unity. 

From the fact that for any event A the relations 

V(^A~^V=A^AVciV 

hold, there follow, by virtue of the preceding property, the in¬ 
equalities 

0 = P(V)<P(^)<P(t/)= 1 

Sec. 5. The Classical Definition of Probability. Examples 

We now consider some cases in calculating the probabilities of 
events using the classical definition of probability. The examples 



Sec. 5. The Classical Definition of Probability. Examples 


27 


are strictly illustrative in character and do not claim to exhaust 
all basic methods of calculating probabilities. 

Example 1. From a deck of 36 cards draw three at random. Find 
the probability that there will be exactly one ace among them. 

Solution. The complete group of equally probable and mutually 
exclusive events in our problem consists of all possible combinations 
of three cards; their number is Cfg. The number of favourable events 
may be computed as follows. One ace may be drawn in CJ different 
ways, while the two other cards (non-aces) may be drawn in Cfa diffe¬ 
rent ways. Since for each definite ace the two remaining cards may 
be chosen in Cfg ways, there will be a total of CJ-Cfg favourable cases. 
Thus, the desired probability will be 

4 32.31 

. Cl.C3%_ 1 ’ 1.2 _ 31.16 _ 496 _ ^ ov'ro 

P _o o/^ rtr" r\ A or o IT 1 Tor \J .Xl I 1 O 


c* 


36 


36.35.34 35.3.17 

1.2.3 


which is slightly more than 0.25. 

Example 2. From a deck of 36 cards draw three at random. Find 
the probability that there will be at least one ace. 

Solution. Denote the event we are interested mhy A: it may be 
represented in the form of a sum of the three following mutually ex¬ 
clusive events: Ai the occurrence of one ace, A^ the occurrence of 
two aces, A^ the occurrence of three aces. 

By reasoning similar to that given in the solution of the previous 
problem it is easy to establish that the number of cases favourable 
to the 

event is Q-Qa 
event A^ is Q-CJa 
event A^ is Q-CJa 

Since the number of all possible cases is we have 

P {A,) = « 0.2778 


P(^3) = 

P(4) = 


^-4 *^^32 

16.31 

C|e 

"■3.35.17 

C4 .C33 

3.16 

^36 

"■3.35.17 

/t3 oO 
*^4 •'-32 

1 

^36 

"■3.35.17 


0.0269 


0.0006 


By virtue of the addition theorem, 

P{A) = P (A,) + P (^a)+ P (^ 3 )= 3.119 


109 


0.3053 



28 


Chap. 1. The Concept of Probability 


This example may be solved in yet another way. Event A, the 
opposite of A, consists in the fact that there will not be a single 
ace among the drawn cards. Obviously, three non-aces can be drawn 
from a deck of 36 cards in C|a different ways and, hence. 


P(^) = 


da 32.31.30 31.8 

Cl “36.35.34 “3.17.7 
'-'86 


0.6947 


The desired probability is 

/>(!)» 0.3053 

Note. In both instances the expression “at random” means that 
all possible combinations of three cards are equally probable. 

Example 3. A deck of 36 cards is divided at random into two 
equal parts. What is the probability that both parts will have an 
equal number of red and black cards? 

The expression “at random” means that all possible divisions of 
the deck into two equal parts are equally likely. 

Solution. We have to find the probability that of 18 cards drawn 
at random from the deck 9 will be red and 9 black. 

The total number of different ways to draw 18 cards from 36 is 
CJJ. The favourable ways are those in which there will be 9 cards 
drawn from 18 red cards and 9 from 18 black cards. Nine red 
cards may be drawn in CJg different ways and 9 black cards also 
in Cfg different ways. Since in drawing 9 definite red cards, 9 black 
ones may be drawn in Cfg different ways, the total number of 
favourable ways is equal to C?8*C?8. And consequently the sought-for 
orobability is 

. Cls-C\s (18!)^ 

Cll “ 36 ! ( 91)4 


In order to get an idea of the magnitude of this probability without 
performing arduous computations, we make use of Stirling’s for¬ 
mula which yields the following asymptotic relation; 

n \ ^V^unn^e'"^ 

We thus have 

18! » 1818^-18 K2jrT8 

91 «9V»l/'2Jt.9 
36! « 3688^-88 K2JI-36 

and consequently 

_ ()/~2nn8.I8i8.e-^8)* _ 

^ ^ /2ji.36.363«.e-88 



Sec. 5. The Classical Definition of Probability. Examples 


29 


After some simplifications we find that 

p ^ ^ -4 0.26 

Example 4. There are n particles, each of which can occupy 
each of N {N>>n) cells with the same probability Find the proba¬ 
bility that: (1) there will be one particle in each of n definite cells, 
(2) there will be one particle in each of n arbitrary cells. 

Solution. This problem plays an important role in modern statis¬ 
tical physics; and depending on how the complete group of equally 
probable events is formed we have one or another physical statistics: 
Boltzmann, Bose-Einstein, Fermi-Dirac. 

In Boltzmann statistics, any thinkable distributions that differ 
not only as to number but also as to the individuality of the particles 
are equally probable: each cell can accommodate any number of 
particles from 0 to n. 

The total number of possible distributions may be computed in the 
following way: each particle may be located in each of the N cells; 
hence, n particles may be distributed in the celjs in Af" different 
ways. 

In the first question, the number of favourable cases will obviously 
be n\ and, consequently, the probability that one particle will fall 
in n definite cells is 

_ n\ 

Pi— 

In the second question, the number of favourable cases is €% ti¬ 
mes greater and hence the probability that there will be one particle 
in n arbitrary cells is equal to 

Cj-n! 

Pi— — N^{M—n)\ 

In Bose-Einstein statistics identical cases are those in which the 
particles change places among the cells (the only important thing 
is how many particles there are in a cell but not the individuality 
of the particles), and the complete group of equally probable events 
consists of all possible distributions of n particles in N cells, one 
distribution being the whole class of Boltzmann distributions, which 
differ not in numbers of particles in specific cells but only in the iden¬ 
tity of the particles themselves. To get a clear-cut idea of the diffe¬ 
rence between Boltzmann statistics and Bose-Einstein statistics 
consider a special case: N=4, n=2. All possible distributions in this 
case may be written in the form of a table (see below) in which a 
and 6 are the names of the particles. In Bolzmann statistics all 



30 


Chap. 1. The Concept of Probability 


16 possibilities are different equally probable events, while in Bose- 
Einstein statistics the cases 5 and 11,6 and 12, 7 and 13, 8 and 14, 
9 and 15, 10 and 16 are identical pairs and we have a group of 10 
equally probable events. 

Now compute the total number of equally probable cases in Bose- 
Einstein statistics. We note that all possible distributions of particles 
in cells may be obtained as follows: arrange the cells in a straight 
line, one after the other, then arrange the particles on the same stra¬ 
ight line one next to the other. Now consider all possible permutations 
of particles and partitions between the cells. It is then easy to see that 
all possible fillings of cells differing both as to order of particles in 
the cells and as to order of partitions will be taken into account. 

The number of these permutations is (N-{-n —1)!. They include 
identical permutations, in which each distribution among the cells 
is counted {N —1)! times, since we distinguished the partitions bet¬ 
ween the cells and also counted each distribution in the cells n\ times, 
because we took into account not only the number of particles in a 
cell but also the kind of particles and their order. We thus counted 
every cell distribution 1)! times, whence the number of diffe¬ 

rent (in the Bose-Einstein sense) distributions of particles in the cells is 

(rt + A^—1)! 
n!(A^—1)1 

Thus the number of equally probable events in a complete 

TABLE 2 



group of events has been found. It is now easy for us to answer 
the questions of our problem. In Bose-Einstein statistics the pro- 



















31 


Sec. 5. The Classical Definition of Probability. Examples 
babilities and are 

1 n!(A^—1)! 

(n-f-yy—1)! “"(n+iy—1)! 
n I (n— 1)! 

_ N\{N—\)\ 

in-\-N—l)\ {N—n)\iN-\-n—l)l 

n]iN—l)l 

Finally, we consider Fermi-Dirac statistics. According to this 
statistics, each cell accommodates either one particle or none: the 
individuality of the particle is ignored. 

The total number of distinct particle distributions in cells in Fermi- 
Dirac statistics is computed with ease: the first particle may be dis¬ 
tributed in N different ways, the second only in N —1, the third in 
(N —2) and, finally, the nth in {N —n+1) different ways. Here, the 
different ways are taken as the modes of distribution that differ only 
in the permutation of particles in the cells. To eliminate particle in¬ 
dividuality we must divide the number thus obtained by n\. 

Then n particles will be arranged in N cells in 

distinct equally probable ways. 

It is easy to figure out that in Fermi-Dirac statistics the sought- 
for probabilities are 

{N—n)\n\ 

The foregoing example shows how important it is to define exactly 
what events are considered equally probable in a problem. 

Example 5. At a theatre ticket office is a queue of 2n persons, n 
have only five-ruble bills and the remaining n have only 10-ruble 
bills. The ticket seller has no change to begin with and each customer 
takes only one 5-ruble ticket. What is the probability that not a single 
customer will expect change? 

Solution. All possible arrangements of customers are equally pro¬ 
bable. We employ the following geometric procedure: regard an 
x^-plane and assume that the customers are arranged along points 
of the x-axis with coordinates 1, 2, ..., 2/i as they stand in the queue. 
The ticket office is located at the origin. To each person with a 10-ruble 
bill we ascribe an ordinate of 1 and to each with a 5-ruble bill an ordi¬ 
nate of —1. Add from left to right the ordinates thus defined at integ- 




32 


Chap, /. The Concept of Probability 


ral points and plot at each of them the sum obtained (Fig. 2). It will 
readily be seen that at the point with abscissa 2n the sum is 0 (there 
will be n terms equal to +1, and n terms equal to —1). Now connect 
with straight lines the adjacent points thus obtained and also connect 
the origin with the leftmost point. The broken line thus obtained 
will be called the trajectory. 











✓ 

/ 

7 











7 



\ 

_ 





/ 

k 

7 

S' 

7 

✓ 

1/ 







7 

\ 

/ 

/ 

7 


\ 







7 





r 


\ 



7 

Zn X 










7 

7 




Fig, 2 


The total number of distinct possible trajectories, as will be readily 
seen, is C’ln (it is equal to the number of all possible orderings of n 
ascents among 2n descents and ascents). The favourable trajectories 
in our case will be those that do not rise above the abscissa axis 
(otherwise at least one customer will approach the ticket seller when 
there is no change available). 

Calculate the total number of trajectories which at least once reach 
or cross the line-segment y=\. For this purpose we construct a new 
(fictitious) trajectory as follows: prior to first contact with the line 
y— 1 the new trajectory will coincide with the original one; from the 
point of contact the new trajectory is a mirror image of the old tra¬ 
jectory relative to the line y~ \ (see the dashed broken line in Fig. 2). 
It is easy to see that a trajectory is defined only for trajectories that 
have at least once reached the line y—\, while for the remaining 
trajectories (i. e., those favourable to our event) it coincides with 
the original one. Further, the new trajectory begins at point (0, 0) 
and ends at point (2«, 2). It thus has two more single ascents than 
descents. Hence the total number of new trajectories is (the 
number of orderings of n+l ascents among 2az ascents and descents). 
Thus the number of favourable cases is and the sought-for 

probability is 





Oft; 


in 


n _ 1 

w-f-1 /i-J-1 



Sec. 6. Geometrical Probability 


33 


Sec. 6. Geometrical Probability 


From the very beginning of the development of probability theory 
it was noticed that the “classical” definition of probability based on 
the consideration of a finite group of equally probable events was 
insufficient. Even at that time special examples led to a certain mo¬ 
dification of the definition and to the construction of a concept of 
probability for cases in which even an infinite set of outcomes is 
conceivable. As before, the concept of an “equal probability” of some 
events played the basic role. 

The general problem that was posed and that led to an extension 
of the notion of probability may be formulated as follows. 

On a plane, let there be a certain region G and in it another region 
g with a rectifiable boundary. A point is thrown at random onto G 
and we wish to find the probability that the point will fall in region 
g. Here, the expression “a point is thrown at random onto region G” 
is given the following meaning: the thrown point can fall in any point 
of G, the probability of falling in any portion of region G is pro¬ 
portional to the measure of this part (length, area, etc.) and is inde¬ 
pendent of its position and shape. 

Thus, by definition, the probability that a point thrown randomly 
onto G will fall in g is equal to 

_ mesg 

^ mes G 


Let US consider some examples. 

Example 1. The Encounter Problem. Two persons A and B have 
agreed to meet at a definite spot between 12 and one o’clock. The 
first one to come waits for 20 minutes 
and then leaves. What is the probability 
of a meeting between A and B if the 
arrival of each during the indicated hour 
can occur at random and the times of 
arrival are independent *. 

Solution. Denote the times of arrival of 
A by X and of B by y. For the meeting 
to take place, it is necessary and suffi¬ 
cient that 

|x—^K20 

We depict x and y as Cartesian coordi¬ 
nates in the plane; for the scale unit we take one minute. All possible 
outcomes will be described as points of a square with side 60; 
favourable outcomes will lie in the shaded region (Fig. 3). 



* That is, the time of arrival of one person does not affect the arrival time 
of the other. 



34 


Chap. 1. The Concept of Probability 


The desired probability * is equal to the ratio of the area of the 
shaded figure to the area of the whole square: 

602 — 402 5 

^ “ 602 — 9 

Some research engineers have applied the encounter problem to 
solving problems in the organization of production. A workman 
is in charge of a number of machines of one type, each of which can 
at random moments demand his attention. It may happen that when 
he is busy with one machine, others will be needing his attention. 
It is required to find the probability of this event; that is, in other 
words, to find how long on the average the machine waits for the work¬ 
man (or the time the machine is idle). However, it may be noted that 
the scheme of the encounter problem is not well suited for a solution 
of this production problem because there is no agreed-upon time during 
which the machines definitely require the attention of the workman, 
and the times the workman spends at any one machine are not constant. 
Aside from this basic reason, we can point to the complexity of cal¬ 
culations in the encounter problem for the case of a large number 
of persons (machines). This case often comes up (in the textile in¬ 
dustry, for instance, some weavers operate tip to 280 looms). 

The theory of geometrical probability has repeatedly been criticized 
for arbitrariness in determining the probabilities of events. Many 
authors have become convinced that for an infinite number of out¬ 
comes the probability cannot be determined objectively, that is, inde¬ 
pendently of the mode of computation. A particularly brilliant propo¬ 
nent of this scepticism is the French mathematician of the 19th cen¬ 
tury Joseph Bertrand. In his course of probability theory he cited a 
number of problems on geometrical probability in which the result 
depended on the mode of solution. The following is an illustration. 

Example 2. Bertrand’s Paradox. A chord is chosen at random in a 
circle. What is the probability that its length will exceed the length 
of the side of the equilateral triangle inscribed in the circle? 

Solution 1. For reasons of symmetry we can specify the direction 
of the chord beforehand. Draw a diameter perpendicular to this di¬ 
rection. It is obvious that only chords that intersect the diameter in 
the interval between one fourth and,three fourths its length will 


* In Sec. 9 we will see that by virtue of the independent arrival times of 
A and B the probability that A will arrive in the interval from x to x-\-h, 

fz s 

and B within the interval between y and w+s is equal to • ttk , that is. it 

60 oO 

is proportional to the area of a rectangle with sides h and s. 



Sec. 6. Geometrical Probability 


35 


exceed the side of the regular triangle. The desired probability is 
thus y. 

Solution 2. Reasoning from symmetry, it is possible to fix one 
of the ends of the chord on the circle in advance. The tangent to 
the circle at this point and two sides of the regular triangle with 
vertex in this point form three 60° angles. Only chords falling in the 
middle angle are favourable to the conditions of the problem. For 
this mode of calculation, the sought-for probability will come out to 
1/3. 



Fig. 4 


Solution 3. To define the position of the chord it is sufficient to 
specify its midpoint. So that the chord will satisfy the condition 
of the problem, it is necessary that its midpoint lie inside a circle 
concentric with the given one but with one-half the radius. The 
area of this circle is equal to one-fourth the area of the given one. 

Thus, the sought-for probability is y. 

We now have to find out wherein lies the nonuniqueness of the 
solution of our problem. Is the reason a fundamental impossibility 
to determine the probability for cases of an infinite number of pos¬ 
sible outcomes or is it because some of our starting premises were 
impermissible? 

It is easy to see that solutions of three different problems are given 
as the solution of one and the same problem due to the fact that the 
conditions of the problem do not define the notion of drawing a chord 
at random. 

Indeed, in the first solution, a circular cylindrical rod (Fig. Aa) 
is made to roll along one of the diameters. The set of all possible 
stopping points of this rod forms the set of points of a segment AB 
of length equal to the diameter. Equiprobable events are those which 
consist in a stop occurring in an interval of length /t, no matter 
where this segment is located on the diameter. 

In the second solution, a rod fixed on a hinge situated on one of 
the points of the circle is made to oscillate no more than 180° (Fig. Ab). 
It is assumed here that a stop of the rod inside an arc of length h 



36 


Chap. 1. The Concept of Probability 


of the circle depends solely on the length of the arc but does not de¬ 
pend on its position. Thus, equally probable events are stops of the 
rod in any arcs of the circle that have the same length. After such a 
simple calculation, it becomes quite obvious that the definitions of 
probability in the first and second solutions are discrepant. According 
to the first solution, the probability that the rod will stop in the inter- 

val from A to a: is ^ . The probability that the projection of the point 

of intersection of the rod with the circle in the second solution will 
fall in the same interval is, as elementary geometric calculations show, 
equal to 

— arc cos —— for a: ^ -ir 

ifX Jlj 2t 

and 

, 1 2x—D e D 

1-arc cos —=;— for a; ^ -;r- 

Jl U ^2 

Finally, in the third solution we throw a point into the circle at 
random and ask ourselves about the probability of its falling within 
a certain smaller concentric circle (Fig. 4c). 

The different statements of the problems in all three cases are quite 
obvious. 

Example 3. Button’s Needle Problem, A plane is partitioned by 
parallel lines separated by a distance of 2a. A needle of length 
21 {lea) is thrown at random* onto the plane. Find the probability 
of the needle lying athwart one of the lines. 

Solution. Denote by x the distance from the centre to the closest 
parallel and by (p the angle formed by the needle and this parallel. 
The quantities x and qp fully define the position of the needle. All 
possible positions of the needle are defined by points of a rectangle 
with sides a and jx. From Fig. 5 it is seen that for the needle to cross 
one of the parallel lines it is necessary and sufficient that 

a:^ / sincp 

The sought-for probability i^, by the assumptions that have been 
made, equal to the ratio of the area shaded in Fig. 6 to the area of 
the rectangle: 

jt 

— f / sin qp dep = ~ 
an j 'T T 

0 


* Here, “at random” implies that, firstly, the centre of the needle falls at 
random on a segment of length 2a perpendicular to the drawn straight lines, 
secondly, the probability that the angle <p made by the needle and the drawn 
lines win lie between qpi and qpi+Aqpis proportional to Aqp and, thirdly, that the 
quantities a: and qp are independent (see Sec. 9). 



Sec. 6. Geometrical Probability 


37 


It will be noted that Buffon’s needle problem served as the starting 
point for solving certain problems in the theory of gunfire that take 
shell sizes into account. 

The formula obtained was employed for an experimental determina¬ 
tion of the approximate value of the number ji. A large number of 
needle-throwing experiments have been carried out. A few are listed 
below. 


Experimenter 

Year 

Number 
of throws 

Experimental 

value 

Wolf 

1850 

5000 

3.1596 

Smith 

1855 

3204 

3.1553 

Fox 

1894 

1120 

3.1419 

Lazzarini 

1901 

3408 

3.1415929 


Since from the formula we obtained there follows the equation 

21 

3X = — 

ap 

for a large number n of throws ji is approximately equal to 

2ln 


where rn is the number of intersections obtained. 



It will be noted that the results of Fox and Lazzarini are 
unreliable. In the experiment of the latter, the value of n is given 
to six exact’ decimal places. Changing the number of intersections 
(the number m) by unity affects at least the fourth decimal if n is 
less than 5000. Indeed, since a^l, 


a (m +1) 
2/n 


am a 

2ln 2ln 


2n 


0.0001 



38 


Chap. 1. The Concept of Probability 


There is consequently only one value of m which could give the 
value of n found by Lazzarini. As we shall see in Chapter 2, the 
probability of obtaining exactly m intersections may be calculated 
approximately from the formula 


Pn{m) 


1 ^ 
Y2nnp (1 — p) 


(m-np)* 
%np (l-p) 


If for the sake of definiteness we assume that a = 2/, then in the 
Lazzarini experiments we find for any m that 


P„(mX 


1 

V^2jxnp (1 — p) 


0.03 


Thus, the probability of obtaining Lazzarini’s result is less than 
1/30. 


Example4. Throw at random* a convex contour of diameter less 
than 2a on a horizontal plane partitioned by parallel lines separated 
by a distance of 2fl!. Find the probability that the contour will inter¬ 
sect one of the parallel lines. 

Solution. First suppose that the convex contour is an /t-sided po¬ 
lygon. Let the sides be numbered from 1 to n. If the polygon intersects 
a parallel line, then this intersection should occur with two of the sides. 
Denote by Pij—pji the probability that the intersection will occur 
with the tth and /th sides. Obviously, an event A which consists 
in the thrown polygon intersecting one of the parallel lines may be 
represented in the form of the following sum of pairwise mutually 
exclusive events: 

A = (Ajg -f- Ai 3 -f- . •. + Ai„) -f- (Agg + ^24 + • • • + + ... 

• • • "F (^n —2, n —1 ^n — s, n) ^n — X, n 

where f=l, 2, ...; /:=1, 2, ...) is an event which 

consists in an intersection of the fth and /th sides with the parallel 
line. By the addition theorem for probabilities 


p = P (A) = [P -j- P (A13) + ... -f- P (AjJ] 4- 

"f [P (^23) H- • • • + P (Ago)] + -.. 4 - P (A„_i, „) = 

— ipli + Pi 3 + • - • + Pin) + (P23 + P24 + • • • + Pin)~\~ • • • + Pn- 1 , n 


* Here, “at random” means that we take any segment rigidly fixed to the 
curve and throw it at random in the meaning of the preceding example. 

It is easy to demonstrate that the notion “at random” thus defined is inde¬ 
pendent of the choice of the given segment. 



Sec. 7. Frequency and Probability 


39 


Taking advantage of the equation — we can write the pro¬ 
bability p in SL different way: 

P ~~2 [(/^i2 + Pi3+ * • • +Pi«)+(p2iH“P23+ • • • +P2n)+ • • • 

• • • + iPnl + P/12 + • • • + P/J, /*-l)l 

n 

But the sum S'P// where we put Pi 7 = 0 is the probability of the 
/ = ! 

intersection of the ith side of the polygon with one of the parallel 
lines. If the length of the tth side is denoted by 24*, then from 
Button’s problem we find that 


and, consequently, 



2 2 /; 
t =1 

2na 


Denoting by 2s the perimeter of the polygon, we obtain 



We thus see that the probability p does not depend either on the 
number of sides or on the lengths of the sides of the polygon. From 
this we conclude that the formula that was found holds for any convex 
contour, for it can always be considered as the limit of a convex po¬ 
lygon with the number of sides increasing to infinity. 


Sec. 7. Frequency and Probability 

When passing from elementary cases to complex problems, espe¬ 
cially those dealt with in natural science or technology, the classical 
definition of probability encounters insuperable difficulties of a fun¬ 
damental nature. First of all, in most cases the question arises of 
the possibility of finding a reasonable way of isolating the “equally 
probable cases”. For example, for reasons of symmetry (on which our 
arguments are based concerning the equiprobability of events), it 
appears at present to be at least difficult to derive the probability 
of decay of an atom of a radioactive substance within a given time 
interval or to determine the probability that a child which is to be 
born will be a boy. 

Prolonged observations of the occurrence or nonoccurrence of an 
event A for a large number of repeated trials that occur under an in¬ 
variable set of conditions @ show that for a broad range of phenomena 
the number of occurrences or nonoccurrences of A obeys stable laws. 



40 


Chap. 1. The Concept of Prdtability 


Namely, if we denote by [i the number of occurrences of an event A 
in ri independent trials, it will be found that the ratio— for sufficient¬ 
ly large n in the majority of such series of observations will be almost 
constant, large deviations being progressively rarer as the number 
of trials is increased. 

This kind of stability of frequencies (i.e., of the ratios was first 

noted in phenomena of a demographic nature. In antiquity it was al¬ 
ready noticed that for whole states and for large cities the ratio of 
the number of boys born to the total number of births remained 
unchanged from year to year. In ancient China, in ,2238 B. C. this 
number was, on the basis of censuses, taken to be equal to V 2 . Later, 
particularly in the 17th and 18th centuries, a number of fundamental 
studies were devoted to the statistics of population. It was found 
that apart from the stability in births of boys and girls there were 
observed stable regularities of a different character: the percentage 
of deaths in a definite age bracket for specific groups of the population 
(of a particular economic and social background), the distribution 
of persons (of one sex, age and nationality) as to height, breadth of 
chest, length of footstep, etc. 

Laplace, in his book Essai philosophique sur les probabilites, relates 
of a very indicative episode that occurred when he was studying the 
regularities of birth of boys and girls. Extensive statistical materials 
that he had studied dealing with London, St. Petersburg, Berlin, 
and all of France yielded almost exactly coincident ratios of the num¬ 
ber of births of boys to the total number of births. Over many decades 
all these ratios fluctuated about one and the same number, appro¬ 
ximately equal to 22/43. Yet a study of similar statistical materials 
of Paris for the 40 years between 1745 and 1784 produced a slightly 
different number 25/49. Laplace was intrigued by such a substantial 
difference and he began to search for a rational explanation. A detai¬ 
led study of the archives showed that the total number of births in 
Paris included all foundlings. It also came to light that the surrounding 
population had a preference for abandoning infants of one sex. This 
social phenomenon was at that time so common that it had appreciably 
distorted the true picture of births in Paris. When Laplace eliminated 
the foundlings from the total number of births, it was found that 
for Paris as well the ratio of boy births to the total number of birth? 
was likewise stable and that it was close to the number ,22/43, which 
is the same for other peoples and for France as a whole. 

Since Laplace’s time, extensive statistical material has accumu¬ 
lated that permits very confident predictions to be made of the quan¬ 
titative characteristics of socially important demographic phenomena. 
In conclusion, we give rather recent statistical findings of a nearly 
constant frequency for a large number of trials: the distribution of 



Sec. 7 . Frequency and Probability 


41 


newborn infants as to sex by month (see Table 3). The findings are 
taken from H. Gramer’s book Mathematical Methods of Statistics 
and represent the official data of Swedish statistics for 1935. 

Figure 7 shows the deviations of the frequency of girl births by month 
from the frequency of girl births for the year. We see that the frequency 
fluctuates about the number 0.482. 

It turns out that for those cases to which the classical definition 
of probability is applicable, the fluctuation of frequency occurs 
about the probability of the event p. 


os 

0.4825 


av —L-j—i I I 1 I - » I I I < 

1 2 3 4 5 6 7 8 9 10 11 12 

Fig. 7 

There is extensive experimental material to verify this fact. Coin 
tossing, die throwing and needle dropping have all been used to de¬ 
termine empirically the number n (see Example 3 of Sec. 6), and other 
things. We give some of the results obtained in coin tossing. 


Experimenter 

Number of 
throws 

Number of 
"heads” 

Frequency 

Buffon. 

4040 

2048 

0.5080 

Karl Pearson. 

12,000 

6019 

0.5016 

Kari Pearson ..... 

. 

24,000 

12,012 

0.5005 


At the present time there are other verifications of this empirical 
fact which are of important scientific and applied value. In modern 
statistics, an important role is played by tables of random numbers 
in which each number is chosen at random from the set of digits 
0, 1, 2, ..., 9. In one of the tables the number 7 appears 968 times 
in the first 10,000 random numbers, which gives it a frequency of 
0.0968 (the probability of occurrence of 7 is equal to 0.1). Computing 
the number of occurrences of 7 in a sequence of one thousand random 
numbers, we get the following: 

No. of thousand ... 1 2 3 4 5 6 7 89 10 

Number of sevens . . 95 88 95 112 95 99 82 89 Ill 102 

Frequency .... .0.095 .088 .095 .112 .095 .099 .082 .089 .111 .102 





TABLE 3 


42 


Chap. 1. The Concept of Probability 


One 

year 

88,273 

45,682 

42,591 

0.4825 


7132 

3761 

3371 

0.473 


6552 

3392 

3160 

0.482 

o 

6903 

3512 

3391 

0.491 

Oi 

7203 

3712 

3491 

0.485 

00 

7393 

3797 

3596 

0.484 


7585 

3964 

3621 

0.462 

(D 

7609 

3944 

3665 

0.482 

lO 

7892 

4117 

3775 

0.478 


7884 

4173 

3711 

0.471 

CO 

7883 

4017 

3866 

0.490 


6957 

3550 

3407 

0.489 

- 

7280 

3743 

3537 

0.486 

Month 

Total births ....... 

Boys .. 

Girls .. 

Frequency of girls .... 


The frequencies of occurrence of a seven 
in the different thousands fluctuate rather 
considerably, but are still comparatively 
close to the probability. 

The fact that in a large number of 
trials the frequency of a series of random 
events remains almost constant compels us 
to presume that there are regularities go¬ 
verning the course of the phenomenon which 
are independent of the investigator, the man¬ 
ifestation of which regularities is this 
nearly constant frequency. Again, the fact 
that the frequency of events, to which the 
classical definition of probability is appli¬ 
cable, is as a rule (in a large number of 
trials) close to the probability compels us 
to the view that in the general case there 
is some constant about which the frequency 
fluctuates. It is natural to term this constant, 
which is an objective numerical characte¬ 
ristic of the phenomenon, the probability 
of the random event A under study. 

We will thus say that event A has a pro¬ 
bability if it possesses the following pecu¬ 
liarities: 

(a) it is possible, at least theoretically, 
to carry out under the same conditions © 
an unlimited number of independent trials, 
in each of which A may or may not oc¬ 
cur; 

(b) as a result of a sufficiently large num¬ 
ber of trials it is noted that the frequency 
of the event A for nearly every large run 
of trials departs only slightly from a cer¬ 
tain (generally speaking, unknown) con¬ 
stant. 

In a large number of trials, the numeri¬ 
cal value of this constant may approxi¬ 
mately be taken to be the frequency of the 
event A or a number close to the frequen¬ 
cy. The probability of a random event 
thus defined is called statistical probability. 

Note that frequency has the following 
properties: 

(1) the frequency of an event that is cer¬ 
tain is unity; 






Sec. 7 . Frequency and Probability 


43 


(2) the frequency of the impossible event is zero; 

(3) if a random event C is the sum of mutually exclusive events 
Ai, ... y Any then its frequency is equal to the sum of the frequen¬ 
cies with which the component events occur. 

Quite naturally, in the case of the statistical definition, we must 
require that probability satisfy the following properties: 

(1) the probability of an event that is certain is unity; 

(2) the probability of the impossible event is zero; 

(3) if a random event C is the sum of a finite number of mutually 
exclusive events Au Azy ... y An having probability, then its proba¬ 
bility exists and is equal to the sum of the probabilities of the compo¬ 
nents: 

P(0=P (^i)+ P{A,)+... A-P(An) 

The statistical definition of probability given here is descriptive 
rather than formally mathematical in character. It is deficient in 
yet another aspect as well: it does not lay bare the actual peculi¬ 
arities of those phenomena for which the frequency is stable. This 
is to stress the necessity of further investigations in the indicated 
direction. However, and this is particularly important, in the given 
definition we retain the objective character of probability that is 
independent of the investigator. The fact that only after performing 
certain preliminary observations we can judge that some event has 
a probability does not in the least detract from our conclusions, 
for a knowledge of regularities is never derived from nothing; it 
is always preceded by experiment and observation. Of course, these 
regularities existed prior to the intervention of the experimenting 
thinking person, but they were simply unknown to science. 

We have already said that we have not given a formally mathe¬ 
matical definition of probability but have only postulated its exis¬ 
tence for certain conditions and have indicated a method for an ap¬ 
proximate evaluation of it. Any objective property of a phenomenon 
subjected to study, including the probability of event A, must be 
determined solely from the structure of the phenomenon, irrespective 
of whether an experiment is performed or not and whether an expe¬ 
rimenting intellect is present or not. Nevertheless, experiment plays 
an essential role: first of all, it is precisely experiment that 
enables one to perceive theoretico-probabilistic regularities in nature, 
secondly, it permits finding in approximate fashion certain proba¬ 
bilities of the events under study, and, finally, it enables us to verify 
the correctness of the theoretical premises that we use in our investi¬ 
gations. This circumstance requires an explanation. 

Suppose that certain arguments suggest that the probability of 
some event A is p. Further, in a series of independent trials let it 
be that the frequencies, for the most part, deviate substantially from p. 
This justifies any doubt we may entertain about the correctness of 



44 


Chap. I. The Concept of Probability 


our a priori judgements and Justifies undertaking a more detailed 
study of the premises on which our a priori conclusions were based. 
For instance, we assume that the die we are using has regular geomet¬ 
ric forms and that the material it is made of is homogeneous. From 
these preliminary premises we are entitled to conclude that when thro¬ 
wing the die the probability of any face coming up (say, with number 5) 
must be equal to 1/6. If repeated series of large numbers of trials 
(tosses) systematically demonstrate that the frequency of occurrence 
of this face departs significantly from 1/6, then we will not doubt 
the existence of a definite probability of this face turning up, but will 
be skeptical about our premises concerning the regularity of the die 
or proper organization of the trials (tosses). 

In conclusion, we must examine the very widespread (particularly 
among naturalists) conception of probability given by R. von Mises. 
According to von Mises, since the frequency deviates less and less 
from the probability p as the number of experiments increases, we 
should have in the limit 

p= lim ^ 

R. von Mises proposes to regard this equation as a definition of the 
concept of probability. In his opinion, any a priori definition is 
doomed and only his own empirical definition is capable of ensuring 
the interests of natural science, mathematics and philosophy; and 
since the classical definition has only an extremely limited appli¬ 
cation, while the statistical definition is applicable to all cases of 
scientific interest, von Mises proposes rejecting outright the classical 
definition in terms of equal probability based on symmetry. What 
is more, Mises considers it quite unnecessary to elucidate the structure 
of the phenomena for which probability is an objective numerical 
characteristic, it being regarded as sufficient to have an empirical 
stability of frequency. 

Von Mises believes that the theory of probability has to do with 
infinite sequences (called collective^) of outcomes of trials. Every 
collective must possess two properties: 

(1) the existence of limiting values of relative frequencies of those 
of its members which possess some property of a certain specific 
group of members; 

(2) randomness^ i.e., invariance of these limits relative to extra¬ 
ction (from the collective) of any subsequence in accordance with 
a law that is arbitrary except that it must not be based on any diffe¬ 
rence of the elements of the collective with respect to the property 
under consideration. 

The construction of a mathematical theory based on fulfillment 
of both these requirements encounters insuperable logical difficul¬ 
ties. The point is that the requirement of randomness is found to be 



Sec. 8. An Axiomatic Construction of Probability Theory 


45 


incompatible with the requirement of the existence of a limit. We 
shall not dwell on the details of von Mises’ theory. We refer the reader 
to his book Probability, Statistics, and Truth. For extensive criticism, 
see articles by A. Ya. Khinchin*. 

It will be noted that observations of statistical stability of frequen¬ 
cies of many actual phenomena served as the starting point for con¬ 
structing a theory of probability as a mathematical science. The 
relations that obtain for frequencies serve as the prototype of the 
principal relations that are satisfied by the probabilities of approp¬ 
riate events. From this it is clear why the theory of probability may 
be defined as the domain of mathematics that treats of mathematical 
models of random phenomena which possess the property of stability 
of frequencies. 

Sec. 8. An Axiomatic Construction of the Theory 
of Probability 

Up until fairly recently, the theory of probability had not esta¬ 
blished itself as a mathematical science and the basic concepts were 
not defined with sufficient precision. This vagueness frequently led 
to paradoxical conclusions (recall the paradoxes of Bertrand). Quite 
naturally, applications of probability theory to the study of natural 
phenomena were but feebly substantiated and occasionally encoun¬ 
tered sharp and justified criticism. It must be said that these cir¬ 
cumstances did not greatly embarrass natural scientists and their 
naive theoretico-probabilistic approach in various fields of science 
led to big successes. The development of natural science at the start 
of this century made stringent derriands on probability theory. It 
became necessary to study systematically the basic concepts of pro¬ 
bability theory and to clarify the conditions under which the results 
of the theory could beemployed. That is why a formal-logical substan¬ 
tiation of the theory of probability and its axiomatic construction 
became so important. In this approach, certain premises, which were 
a generalization of many centuries of human experience, had to be 
laid to form the foundation of probability theory as a mathematical 
science. Its subsequent development must build up via deductions 
from these basic principles without resort to pictorial conceptions and 
“common sense” conclusions. In other words, probability theory must 
be constructed from axioms just like any other established mathema¬ 
tical science such as geometry, theoretical mechanics, abstract group 
theory, etc. 

In modern mathematics, axioms are taken to be propositions that 


* “Mises on Probability and the Principles of Physical Statistics” Uspekhi 
fizicheskikh nauk, IX, Issue 2, 1929. “The Frequency Theory of R. von Mises 
and Modern Ideas of Probability Theory”, Voprosy filosofii. No. 1, p. 9i-102; 
No. 2, p. 77-89, 1961. Both in Russian. 



46 


Chap. 1. The Concept of Probability 


are regarded as true and are not proved within the framework of the 
given theory. All other propositions of the theory have to be derived 
from the accepted axioms in purely logical fashion. Formulation of 
the axioms (that is, the fundamental propositions on the basis of 
which an extensive theory is built up) does not represent the initial 
stage in the development of mathematical science, but is the result 
of a prolonged accumulation of facts and a logical analysis of the re¬ 
sults obtained with the purpose of revealing the actual basic primary 
facts. That precisely is the way in which the axioms of geometry that 
are studied in elementary mathematics took shape. The same pathway 
was taken by probability theory, in which the axiomatic constru¬ 
ction of its principles was carried out in comparatively recent times. 
For the first time, the problem of an axiomatic construction of proba¬ 
bility theory as a logically complete science was posed and solved 
in 1917 by the noted mathematician S. N. Bernstein, who proceeded 
from a qualitative comparison of random events on the basis of their 
greater or lesser probability. 

A different approach has been proposed by A. N. Kolmogorov, 
which closely relates the theory of probability with the modern metric 
theory of functions and also set theory. This book will follow Kolmo¬ 
gorov’s approach. 

We shall see that the axiomatic construction of the principles of 
probability theory proceeds from the basic properties of probability 
noticed in examples of the classical and statistical definitions. Thus, 
the axiomatic definition of probability includes both the classical 
and the statistical definitions as particular cases and overcomes the 
deficiencies of each of them. On this basis it was possible to construct 
a logically perfect structure of the modern theory of probability and 
at the same time to satisfy the enhanced requirements of modern na¬ 
tural science. 

In Kolmogorov’s axiomatics of probability theory the concept 
of a random event is not primary and is constructed out of more ele¬ 
mentary notions. We already encountered that approach when exa¬ 
mining certain examples. For instance, in problems dealing with 
the geometrical definition of probability a region G of space was 
examined (of a straight line, a plane, etc.) on which a point is thrown 
at random. Here, the random events are falls in certain subregions 
of G. Every random event is here a certain subset of the set of points G. 
This idea underlies the general concept of a random event in the axi¬ 
omatics of Kolmogorov. 

Kolmogorov starts from a set (a space) V of simple (elementary) 
events. The elements of this set are immaterial for the logical deve¬ 
lopment of the theory of probability. The next to be considered is a 
certain family F of subsets of the set U\ the elements of the family 
F are called random events. It is assumed that the following three re¬ 
quirements are fulfilled relative to the structure of the family F: 



Sec. 8. An Axiomatic Construction of Probability Theory 


47 


(1) F contains the set U as one of its elements. 

(2) If A and B —subsets of U —are elements of F, then the sets 
A+B, AB, A and B are also elements of F. 

Here, ^4+5 is understood to be a set composed of the elements of 
V that are components either of ^ or of 5 or of A and of B\ hy AB 
is understood the set consisting of the elements oi U belonging both 

to A and B, and, finally, by A (B), the set of elements oi U that do not 
belong to A (to B). 

Insofar as the entire set t/’belongs to F (as an element), by the 

second requirement F also contains 17, that is, F contains the empty 
set as an element. 

It is readily seen that the second requirement implies belonging 
to the set F of sums, products and complements of a finite number 
of events belonging to F. Thus, elementary operations on random 
events cannot take us beyond the limits of the set of random events. 
As in Sec. 3, we shall call the family of events F a field of events. 

In many very important problems we shall have to demand more 
of a field of events, namely: 

(3) If subsets Ai, Aa, ..., A„, ... of set U are elements of the 
set F, then their sum Ai+Aa-f-... +A,i+ ... and product AiAa... 
.. .A„... are also elements of F. 

The set F formed in this fashion is called a Borel field of events (an¬ 
other now frequently used term is ''o-algebra of events''). 

The foregoing definition of a random event is in full agreement 
with the picture we got when examining concrete examples. To make 
this still clearer let us consider two instances in detail from this 
viewpoint. 


Example 1. A die is thrown. The set U of elementary events con¬ 
sists of six elements: Fi, Fa, Fg, F4, Fg, Fg. Here, Et signifies a roll 
of i points. The setF of random events consists of the following 2®=64 
elements: (F), (Fi), (Fa), (Fg), (F4), (Fg), (Fg), (Fi, Fa), (Fi, Fg),..., 
(Fg, Fg), (Fi, Fa, F3), . . . , (F4, Fg, Fg), (Fi, Fa, Fg, F4,), ..., 
(Fg, F4, Fg, Fg), (Fi, Fa, Fg, E Fg),..., (Fa, Fg, F4, Fg, Fg), (F„ 
Fa, Fg, F4, Fg, Fg). 

Here, each pair of parentheses indicates which of the elements of 
set V are used to make up a subset belonging to F (as an element); 
the symbol (V) indicates the empty set. 


Example 2. Encounter Problem. The set U consists of points of 
the square: 0^x^60, O^^^GO. 

The set F consists of all Borel sets composed of points of this square. 
In particular, the set consisting of points of the closed region |x—^ 
^20 is contained in F and is a random event. 

It is natural to introduce the following definitions. 

If two random events A and B do not contain the same elements 



48 


Chap. 1. The Concept of Probability 


of the set C/, we shall call them mutually exclusive. 

The random event U will be called certain, and thejrandom event 

V (empty set) the impossible event. Events A end A are contrary 
(complementary) events. 

Now we can formulate the axioms that define probability. 

Axiom 1. With each random event A in the field of events F 
there is associated a nonnegative number P(A), called its probabi¬ 
lity. 

Axiom 2. P(U)=^\. 

Axiom 3 (Axiom of Addition). If events A,, A^, A„ are 

pairwise mutually exclusive, then 

P (Ai + A, +... + A„) = P (Ai) -f P (A,) +... + P (A„) 

For the classical definition of probability there was no need to 
postulate the properties expressed by Axioms 2 and 3, since these 
properties of probability were proved by us. And the assertion in 
Axiom 1 is contained in the classical definition of probability 
itself. 

From these axioms we shall derive several important elementary 
corollaries. * 

First of all, from the obvious equality 

0' = F-f£/ 

and Axiom 3 we conclude that 

P(t/) = P(F) + P(f/) 

Thus, 

1. The probability of the impossible event is zero. 

In similar fashion it is easy to detect that 

2. For any event A 

P(J)^1—P(A) 

3. No matter what the random event A, 

0<P(A)<1 

4. If event A implies event B, then 

P(A)<P(B) 

5. Let A and B be two arbitrary events. Insofar as the sum¬ 
mands in the sums A-\-B = A-\-(B —A5) and B = AB-\-{B — AB) 
are mutually exclusive events, in accordance with Axiom 3 

P(A + B) = P(A) + P(B--AB); P(B) = P(AB) + P(B—AB) 

* As we shall see in Sec. 29, the teaching of probability is reduced by these 
axioms to the theory of measure defined on Borel fields of sets. Probability 
itself is a nonnegative additive set function. 



Sec. 8. An Axiomatic Construction of Probability Theory 


49 


From this follows the addition theorem for arbitrary events A 
and B: 

P(A + B)^?{A)-\-P(B)^P(AB) 

By virtue of the nonnegativity of P(AB) we conclude that 

P(A4-B)<P(A) + P(B) 

By induction we now derive that if A^, A^, A„ are arbit¬ 
rary events, we have the inequality 

P{A,-\-A, f... +A„KP(Aj + P(A,)+...+P(AJ 

The system of axioms of Kolmogorov is consistent, for there 

exist real objects that satisfy all these axioms. For example, if £/ 
is taken as an arbitrary set with a finite number of elements 
U—[a^, F as a collection of all subsets 

...» flij, 0 <fi<i 2 < 0<s</i, then putting 

P(ai) = Pi, P{a,)=P .. P(a„)=‘P„ 


where pi, ..., pn are arbitrary nonnegative numbers that sa¬ 
tisfy the equality -f-Pa-f •. •1, and P(ai„ 

we will satisfy all of Kolmogorov’s axioms. 

The system of axioms of Kolmogorov is incomplete: even for 
one and the same set U we can choose the probabilities in the 
set F in different ways. 

To take an example, in the case of the die that we examined 
earlier we can either put 

P(£,) = P(£,) = ...=P(£.) = -1 (1) 


or 

P (£.) = P (£.) ^ P (£.) = I. P (£,) = P (£.) = P (£.) = ^ (2) 

and so forth. 

The incompleteness of a system of axioms in probability theory 
is not an indication of an inapt choice or insufficient mental effort 
in their construction, but is due to the essence of the matter: in va¬ 
rious problems there may be phenomena whose study demands con¬ 
sideration of identical sets of random events but with different pro¬ 
babilities. For instance, there may be dice one of which is a true die 
(exact cube with identical density at every point) and the other not 
true. In the first case, the system of probabilities will be specified 
by the system of equations (1), the second, say, by the system (2). 

Further development of the theory requires an additional proposi¬ 
tion, which is called the extended axiom of addition. The new axiom 
has to be introduced due to the fact that in probability theory one 



50 Chap. 1. The Concept of Probability 

constantly has to deal with events that decompose into an infinite 
number of particular cases. 

Extended Axiom of Addition. If an event A is equivalent to the 
occurrence of at least one of two pairwise mutually exclusive events 
Aj, ^ 2 , ..., , then 

P {A) = P (Aj) + P (Ag) + ... P (A„)+... 

It will be noted that the extended axiom of addition can be 
replaced by the axiom of continuity, which is equivalent to it. 

Axiom of Continuity. If a sequence of events B„, 

... is such that each succeeding event implies the preceding event 
and the product of all events B^ is the impossible event, then 

P(B„) —>0 when n—*oo 

We shall prove the equivalence of these propositions. 

1. The axiom of continuity follows from the extended axiom of 
addition. Indeed, let events B^, B^, ..., B„, ... be such that 

Bi Z) ^2 3 . . . 3 3 ... 

and for any n ^ 1 

( 3 ) 

It is obvious that 

~ 2 ^k^k+1 + n 

k—n k 

Since the events in this sum are pairwise mutually exclusive, 
it follows, according to the extended axiom of addition, that 

k—n J 

But by virtue of condition (3) 



therefore 

P (Bn) = 2 P + 

k = n 

that is, P(5„) is the remainder of the convergent series 

Ii,p(bA«)=p(Bi) 

k= 1 



Sec. 9. Conditional Probability and Elementary Basic Formulas 


51 


For this reason P(B„)—as n—^oo. 

2. The extended axiom of addition follows from the axiom of 
continuity. Let events .... ... be pairwise mutually 

exclusive and 

/4 = + idg + ... + -j- • • • 

We put 

B„= I, A, 

k = n 

It is clear that c jB„. If event has occurred, then some 
one of the events Ai(i'^n) has occurred and, hence, by virtue of 
pairwise mutual exclusiveness of events A/^y events Ai+^y ... 
did not occur. Thus, events ... are impossible and, 

OD 

consequently, the event JJ is impossible. By the axiom of 

k — n 

continuity, P(jB„)—> 0 as n—^oo. Since 

= i4i + /Ig 4*... + + 

^n+l 

we have, from the ordinary axiom of addition, 

P(^) = P(^) + P(^)+...+PM„) + P(B„h..) = 

=iim |;p(^*)= I:p(^) 

n-* CO k=i k=cl 

In conclusion we can say that from the point of view of set 
theory, the axiomatic definition of probability that we have given 
here is nothing other than the introduction into the set t/ of a 
normed, countably additive, nonnegative measure defined on all 
elements of the set F. 

When defining the concept of probability we have to indicate not 
only the initial set of elementary events U (in modern works it is 
frequently denoted by the letter Q as well), but also the set of random 
events F and the function P defined on it. The collection {Uy F, P} 
is called a probability space. 


Sec. 9. Conditional Probability and the Most Elementary Basic 
Formulas 

We have already stated that a certain set of conditions © under¬ 
lies the definition of probability of an event. If no constraints other 
than the conditions © are imposed in computing the probability 
P(i4), the probability \s unconditional. 

However, in a number of cases it is necessary to find the proba¬ 
bility of events, given the supplementary condition that a certain 



52 


Chap. 1. The Concept of Probability 


event B has occurred that has a positive probability. We shall call 
such probabilities and denote them by the symbol 

this signifies the probability of event A on condition that event B 
has occurred. Strictly speaking, unconditional probabilities are also 
conditional probabilities, since for the starting point of the theory 
we supposed the existence of a certain invariable set of conditions 

Example 1. Two dice are thrown. What is the probability that the 
sum 8 comes up (event if it is known that this sum is an even num¬ 
ber (event B)? 

Table 4 gives all possible cases for rolls of two dice. Each cell in¬ 
dicates a possible event: in parentheses, the first number is the sum of 
the first die, the second, the sum of the second die in each throw. 

The total number of possible cases is 36, of which 5 are favourable 
to event A. Thus, the unconditional probability is 

P(^) = 35 

If event B has occurred, then one of 18 (not 36) possibilities is reali¬ 
zed and, consequently, the conditional probability is 

P(^/-B) = re 

Example 2. Two cards are drawn in succession from a deck of cards. 
Find: (1) the unconditional probability that the second card will 
be an ace (which card was drawn first is unknown), and (b) the con¬ 
ditional probability that the second card will be an ace if the first 
was an ace. 


TABLE 4 


(i.i) 

(2.1) 

<3.1) 

(4,i) 

(5.1) 

(6.1) 

(1.2) 

(2,2) 

■3,2) 

(4.2) 

(5,2) 

(6,2) 

(i,3) 

(2,3) 

(3,3) 

(4.3) 

(5,3) 

(6.3) 

(1.4) 

(2.4) 

(3,4) 

(4,4) 

(5,4) 

(6.4) 

(1.5) 

(2.5) 

(3.5) 

(4,5) 

(5,5) 

(6,5) 

(1.6) 

(2.6) 

(3.6) 

(4.6) 

(5.6) 

(6,6) 



Sec. 9. Conditional Probability and Elementary Basic Formulas 


53 


Denote by A the event which consists in the occurrence of an 
ace in the second place and by B the event consisting in the ap¬ 
pearance of an ace in the first place Clearly, we have the equation 

A = AB-\-AB 

By virtue of the fact that the events AB and ^45 are mutually 
exclusive, we have 

P{A) = P{AB) + P(AB) 

Drawing two cards from a deck of 36 cards can yield 36*35 
(taking into account the order!) cases. Of them, there will be 4*3 
cases favourable to the event AB and 32*4 cases favourable to the 

event AB. Thus, 

4.^-1 

— 36 * 35 ^ 36-35 9 

If the first card is an ace, there are 35 cards left and only 
three aces. Hence, 

The general solution to the problem of finding conditional pro¬ 
bability for the classical definition of probability does not present 
any difficulty. Indeed, out of n uniquely possible, mutually exclu¬ 
sive and equally probable events A^, A^,... A„ let 

m events be favourable to event A 
k events be favourable to event B 
r events be favourable to event AB 

(naturally, r r ^m). If event B occurred, this means that 
one of the events Aj has occurred that is favourable to B. Given 
this condition, r and only r events Ay favourable to AB are fa¬ 
vourable to event A. Thus, 


r 


r n P {A B) 

P(i4/B) = -J"“ ife "" P(B) 

tt 

(1) 

In exactly the same way we can derive 



d') 

Naturally, if B(A) is the impossible event, then Equation (1) 
(and, respectively, (1')) becomes meaningless. 

Each of Equations (1) and (T) is equivalent to the so-called 
theorem of muliiplication, according to which 

P (.45) = P (.4) P (BjA) = P (5) P (.4/5) 

(2) 



54 


Chap. 1, The Concept of Probability 


that is, the probability of the product of two events is equal to the pro¬ 
duct of the probability of one of the events by the conditional probability 
of the other provided that the first has taken place. 

The multiplication theorem is also applicable in the case when 
one of the events A and B is the mipossible event, since in this case, 
along with P (y4)=0, we have the equations P {AIB)=0 and P {AB)=(d. 

Conditional probability possesses all the properties of probabi¬ 
lity. It is easy to see this by checking and finding that it satisfies 
all the axioms formulated in the preceding section. Indeed, the first 
axiom is satisfied in obvious fashion, since for each event A a non¬ 
negative function P (A/B) is defined in accordance with (1). If 
then by the definition (1) 


P {B/B) = 


P {BB) P {B) . 

P {B) P(B)~ 


The third axiom can be verified in the same simple fashion and 
we leave this to the reader. 

It will be noted that the probability space for conditional pro¬ 
babilities is specified by the following triplet |5, FB, ■ |> . 

We say that event A is independent of event B if we have the 
following equation: 

P (A/B) = P (A) (3) 


that is, if the occurrence of event B does not alter the probability 
of event A. 

If event A is independent of B, then by virtue of (2) we have 
the equation 

P{A)P{B/A) = P(B)V(A) 

From this we find: 

PiB/A) = P{B) (4) 

that is, event B is also independent of A. Thus, the property of 
the independence of the events is mutual. 

If events A and B are independent, then events A and B are 
also independent. Indeed, since 

P BIA)-^P(BlA)=\ 

and by assumption P (B/A) = P {B), it follows that 

P(B/A)=\—P(B) = P(B) 

Whence we draw the important conclusion that if events A and B 

are independent, then every pair of events (A, B), (A, B), {A, B) are 
also independent. 



Sec. 9. Conditional Probability and Elementary Basic Formulas 


55 


The notion of the independence of events plays a significant role 
in probability theory and its applications. Most of the results given 
in this book have been obtained on the assumption that the events 
under study are independent. 

For a practical determination of the independence of any events, 
one rarely checks to see that Equations (3) and (4) hold for them. The 
usual approach is intuition based on experience. 

For example, it is clear that a fall of heads in a toss of one coin 
does not alter the probability of heads (or tails) coming on another 
coin, if the coins are in no way connected (tied together rigidly, for 
example) during the tossing. In exactly the same way, the birth of 
a boy by one mother does not alter the probability of a boy (or a girl) 
being born by another mother. The events are independent. 

For independent events, the multiplication theorem takes on a par¬ 
ticularly simple form, namely, if events A and B are independent, 
then 

P{AB) = P(A)’P{B) 

We shall now generalize the notion of independence of two events 
to a collection of several events. 

Events B^, B^, ..., are collectively independent if for 

any event Bp of them and arbitrary ..., B,.^ (i„ p) of 

that same number, events Bp and B,.^, B/^, ...,Bi^ are mutually 
independent. 

By virtue of the foregoing, this definition is equivalent to the 
following: for any 1 ^ tj < ig < • • < ^ s and r (1 ^ r ^ s) 

P(B,B,^ ... B4) = P(B0P(B,J ... P(BO 

Note that for several events to be collectively independent it is 
not sufficient for them to be pairwise independent. This is clear 
from the following simple example. Suppose the faces of a tetra¬ 
hedron are coloured as follows: the first red {A), the second green 
(B), the third blue (C), and the fourth in all three colours (i4BC). 
It is easy to see that the probability of red coming up in a throw 
of the tetrahedron is equal to fh^re are four faces and two of 
them have red. Thus, 


p(^)=i 

In exactly the same way we can calculate that 
P (B) = P (C) = P(A/B) = P (B/C) = P (CM) = P (BM) = 

= P(C/B) = P(^/C) = -i- 

Events A, B, C are thus pairwise independent. 



56 


Chap. 1. The Concept of Probability 


However, if we know that events B and C have occurred, then 
event A has definitely occurred; that is 

9{A!BC)== 1 

Thus, events A, B, C are collectively dependent. The above example 
is due to S. N. Bernstein. 

Formula (T), which in the case of the classical definition was 
derived by us from the definition of conditional probability, will 
be taken as a definition in the case of the axiomatic definition of 
probability. So in the general case, for P(y4)>0, we have by de¬ 
finition 

P(BM) = ^ 

(In the case P(A) = 0, the conditional probability P(BM) remains 
undefined.) This enables us to carry over automatically to the ge¬ 
neral notion of probability all the definitions and results of the 
present section. 

Now suppose that event B can occur together with one and 
only one of the n mutually exclusive events A^^, A^, ...» A^. In 
other words, we put 

n 

BA, (5) 

where the events BAi and BAj with different subscripts i and / are 
mutually exclusive. By the theorem of the addition of probabilities 
we have 


P(5)= 

/=! 

Utilizing the multiplication theorem we find 

P{B)=tPiA,)P{B/A,) 

i=l 

This equation is called the formula of total probability and plays a 
basic role throughout the subsequent theory. 

By‘way of illustration we consider two examples. 

Example 3. There are five urns: 

2 urns of composition Ai with two white balls and one black ball 
each, 

1 urn of composition A^ with 10 black balls each, 

2 urns of composition A^ with three white balls and one black ball 
each. 



Sec. 9: Conditional Probability and Elementary Basic Formulas 


57 


An urn is selected at random and one ball is drawn from it randomly. 
What is the probability that the drawn ball will be white (event B)? 

Since the ball can be drawn only from urns of the 1st, 2nd, or 3rd 
composition, it follows that 

By the formula of total probability 

P (B) = P (AJ P (B/A,) + P (A,) P (B/A,) + P (A 3 ) P (B/A,) 

But 

P(^i) = |-, P(.4.) = |, P(^,) = |, 

P(BM,)=|. P(BM,) = 0, P(SM3) = 4 

And so 

p / D\ _ _ 2 2 . 1 ^ . 2 3 17 

= ■3+T-°+T-T=36 

Example 4. It is known that the probability of receiving k calls 
at a telephone exchange during a time intenml t is equal to Pi(k), 
(k=0, 1 , 2 , ...). 

Taking it that any number of calls in two adjacent intervals of 
time are independent events, find the probability of s calls during a 
time interval of duration 2t. 

Solution. Denote by A? the event consisting of k calls arriving 
during time x. Obviously, we have the following equation: 

= A?Af-h ... + AfA? 

which means that event Al^ may be regarded as the sum s+l of 
mutually exclusive events consisting in the fact that during one 
interval of time of duration t there are t calls and during the next 
interval of the same duration there are s —i calls (/ = 0, I, 2, ... 
... , s). By the theorem of addition of probabilities, 

i=0 

By the theorem of multiplication of probabilities for indepen¬ 
dent events, 

P (A^ArO - P (Aj) P (ArO = Pi (0 • Pi (s~0 
Thus, if we put 



58 


Chap. 1. The Concept of Probability 


then 




i =0 


( 6 ) 


Later on we will see that for certain extremely general condi¬ 
tions = 0, 1, 2, ...) 




where a is a certain constant. 
From formula (6) we find 


p,t (s) = Il,-7r(j=or = Ztu?: 

i=0 1=0 


i)\ 


But 


And so 


a (s—i)l s! t! (s—i)! si H" s! 

1=0 i =0 


PAs) = (s = 0, 1, 2, ...) 


(7) 


Thus, if for a time interval of duration t formula (7) is valid, 
then for time intervals double that duration and, as it will readily 
be seen, for any time intervals that are multiples of /, the nature 
of the formula for the probability continues to hold. 

We are now in a position to derive the important formulas of Bayes 
or, as they are sometimes called, “Bayes’ rule for the probability of 
causes”, or Bayes' theorem, or Bayes' formulas for the probability of 
hypotheses. As before, let Equation (5) be valid. It is required to find 
the probability of the event A ^ if it is known that B has already hap¬ 
pened. In accordance with the multiplication theorem we have 


From this 


P {AiB) = P (B) P (A,/B) = P (A,) P (B/A,) 


P (A,/B) 


P {Aj) P {BtAj) 
P{B) 


Using the formula of total probability we find 

P {A ,/B) = 

2 P(Aj)P{B/A/) 



Sec. 10. Examples 


59 


These formulas are due to Bayes. The general procedure for applying 
them to practical problems is as follows. Let event B take place under 
diverse conditions, relative to the nature of which one can set up n 
hypotheses: Au A^, ...» An. For one reason or another we know the 
probabilities P{i4 0 of these hypotheses prior to a trial. It is also 
known that hypothesis A t imparts to event B a probability P{B/Ai), 
An experiment is performed in which B occurs. This should cause a 
reappraisal of the probabilities of hypotheses At; Bayes’ formulas 
solve this problem quantitatively. 

In artillery science we have what is called ranging fire, the purpose 
of which is to improve our knowledge of the firing conditions (proper 
aim, for example). Bayes’ formula is widely used in the theory of 
ranging fire. We confine ourselves to a strictly schematic example for 
the sole purpose of illustrating the type of problems solved by this 
formula. 

Example 5. There are five urns of the following compositions: 

2 urns (composition A J with 2 white and 3 black balls each, 

2 urns (composition As) with 1 white and 4 black balls each, 

1 urn (composition As) with 4 white balls and 1 black ball. 

A ball is chosen from one of the urns taken at random. It turned 
out to be white (event B). What is the probability after the experi¬ 
ment (a posteriori probability) that the ball was taken from the urn 
of the third composition? 

By hypothesis we have 

P(-4.) = |, P(^.) = |, PM3) = -^. 

P(BM.) = |-, P{B/^ .) = -!, P(B/^,) = 1 


4____2 
10 ~ 5 


By Bayes’ formula we have 

P lA iB) __ P ('^s) P _ 

3/^1 p p + P (As) P iB/As)-{-P (As) P (B/As) 

1 4 

5 ■ 5 


2 2.2 1,1 

5 * 5 ‘ 5 ’ ‘ 5 


4 

5 


In exactly the same way we find 

P(AJB) = ^, P(^3/B) = 1 


Sec. 10. Examples 

The following are somewhat more complicated examples of the use 
of the foregoing theory* 



60 Chap. 1. The Concept of Probability 

Example 1*. Two players A and B continue a certain game to the 
ultimate ruin of one of them. The capital of the first is a rubles, the 
capital of the second is b rubles. The probability of winning each play 
is p for player A and q for player B\ p-\-q=l (there are no draws). 
In each play, a win of one player (and, hence, a loss of the other) is 
equal to one ruble. Find the probability of ruin of each of the players 
(the outcomes of the separate plays are presumed to be independent). 

Solution. Before beginning an analytical solution of the problem 
let us see what meaning a simple (elementary) event has here and how 
the probability of the event that interests us is defined. 

An elementary event is to be understood as an infinite sequence 
of alternations of outcomes of the separate plays. For instance, an 

elementary event (A, A, A, ...) consists in the fact that all odd plays 
are won by A and all even ones by player B. A random event—the 
ruin of player A —consists of all elementary events in which A loses 
his capital before player B does. Note that each elementary event is 

a countable sequence consisting of letters A and A; for this reason, in 
each elementary event that enters into the random event (the ruin 
of player A) that interests us,Ahere will be a countable set of alterna¬ 
tions A and A after the game culminates in the ruin of player A. 

We denote by pn{N) the probability of ruin of A during N plays 
if he had n rubles before starting to play. It is easy to determine this 
probability since the set of elementary events consists solely of a fi¬ 
nite number of elements. It is natural here to put the probability of 
each elementary event equal to where m is the number of 

occurrences of A, and N — m is the number of occurrences of A in 
the total number N of occurrences of both letters. In the same way, 
let qnW and r„(A) be, respectively, the probabilities of loss of B 
and a draw after N plays. 

It is clear that the numbers Pn{N) and qn(N) do not diminish with 
growth of N and the number rn(N) does not increase. We thus have 
the limits 

Pn = lim Pn (■'V), Qn = Hm (N), r„ = lim r„ (N) 

N -* <x> N -*■ CD N -* an 


We shall call these limits the probabilities of loss of players A 
and B and of a draw, respectively, provided that at the start A 


* For this ruin problem we retain the classical formulation; but other for¬ 
mulations are possible, such as: a physical parhcle lies on a straight line at 
point 0 and every ‘second is subjected to a random impulse, as a result of 

which it is translated 1 cm to the right with probability p or 1 cm to the left 

with probability q=l —p. What is the probability that the particle will find 
itself to the right of a point with coordinate b{b > 0) before it finds itself to 

the left of a point with coordinate a (a < 0, a and b are inte^rs)? 



Sec. 10. Examples 


61 


had n rubles and B had a-\-b—n rubles. Since for any > 0 

Pn W + '■» {N) = 1 

it follows that in the limit 

P» + 9» + ''» = l 

Further, it is obvious that 

(1) if at the start player A has all the capital and B has 
nothing, then 

Pa+1, = 0. 1. '■»+l, = 0 (1) 

(2) if player A has nothing at the beginning, and B has all the 
capital, then 

<?o = 0, >0 = 0 (F) 

If player A has n rubles prior to some play, then his ruin can 
occur in two different ways: either he will win the next play and 
will lose the entire game, or will lose the play and the game. 
Therefore, from the formula of total probability 

We have a difference equation in it is easy to see that we 
can write it in the following form: 

q iPn — Pn-l) = P iPn-^l — Pn) (2) 

Let us first examine the solution of this equation for p~q=~ . 
On this assumption, 

Pn + l—Pn = Pn — Pn-l=-^’ =Pl — Po==C 

where c is a constant. From this we find 

Since ^©=1 and Pfl+b = 0, it follows that 

1 

a+b 

Thus the probability of the ruin of player A is equal to 

_. a _ b 

Pa a-j-b™ a-\-b 

In like fashion we find that in the case ofp = ~ the probabi¬ 
lity of the ruin of player B is 

„ a 



62 


Chap. 1. The Concept of Probability 


Whence it follows that for p = q = '^l^ 




In the general case, ior p^q, we find from (2) that 

n n 

9'‘n {pk—p«-i)=p" iicPi+i—p*) 

fe=i k=i 

After simplifications and taking advantage of relations (1), we find 

p„+i—p„=(|-)"(pi—1) 

Consider the difference p^+b —it is obvious that 

a+b-l a+b~l 

P«+a—P»= S (Pft+i —P/.)= S —1) = 


(Pi-i) 


q\n / q \a+b 

j) ~[~p) . 

1—1 


Since p„+j = 0, it follows that 


Pn = 0 —Pi) 


p) \p ) 

1 -A 


and since Po=l. we have 


l = (l-Pi) 


q\^ _f q f& 

T) ~vp) 

1 9 


Eliminating p{ from the last two equations we find 

fqy-^b ( q\n 


Pn / q \a+6 ^ 

\T) 


Hence the probability of ruin of player A is equal to 


qfl+b — qdpb 
qa + b _pa+& 





Sec. 10. Examples 


63 


In exactly the same way we find that the probability of ruin of 
player 5, for is 



The last two formulas show that, in the general case, the proba¬ 
bility of a draw is zero: 

/'o = 0 

From these formulas we can draw the following conclusions: if 
the capital of one of the players, say B, is incomparably greater 
than the capital of Ay so that for all practical purposes b may be 
considered infinitely large compared with a, and the players are 
of equal skill, then the ruin of B is practically impossible. The 
conclusion will be quite different if A plays better than B and, 
consequently, p > q. Assuming b'^ oo, we find 



From this we infer that a skilful player with even a slight capital 
can have less chance of ruin than a player with a large capital but 
less skilful. 

Some problems in physics and technology reduce to the ruin problem. 

Example 2. Find the probability that a machine tool operating 
at time U will not stop till time to-ht if it is known that: (1) this 
probability depends only on the length of the time interval (to, fo+OJ 
(2) the probability that the machine will stop during time interval 
At is proportional to At up to infinitesimals of higher orders* with 
respect to A^; (3) events consisting in machine stoppage in nonoverlap¬ 
ping time intervals are independent. 

Solution. We denote the desired probability by/? (0- The probability 
that the machine will stop within the time interval At is 

1 —p (At) = aAt 0 (At) 
where a is some constant. 


* Henceforward, to state that some quantity a is infinitely small compared 


to P, we will write a = o(§). But if the ratio 
we shall write a=0(P). 


-o- is bounded in absolute value, 
P 



64 


Chap. 1. The Concept of Probability 


Let US determine the probability that the machine, which was in 
operation at time will not stop up to time + For this 
event to happen, it is necessary that the machine should not stop 
during times of duration t and A/; by virtue of the multiplication 
theorem we thus have 

p (t + At) — p {t)-p (AO = p (0(1 — 

And from this we have 

p(l±M-PifL =-Qp (0 -0 (1) (3) 


Let us now pass to the limit, putting A^—»-0; from the fact that 
the right-hand side of (3) has a limit it follows that the left-hand 
side has a limit too. And so we find 


dp(t) 

dt 


= —ap (0 


The solution of this equation is the function 

p (0 = 06"°^ 

where C is a constant. This constant is found from the obvious con¬ 
dition that p(0)—1. Thus* 

The first condition of the problem imposes great restrictions on 
the operating regime of the machine; however, there are places where 
it is fulfilled to a high degree of accuracy. An instance is the operation 
of an automatic loom. We note that many other problems reduce to 
the one we have considered, such, for example, as that of the distri¬ 
bution of probabilities of the mean free path of a molecule in the ki¬ 
netic theory of gases. 

Example 3. Mortality tables are often compiled on the following 
assumptions: 

(1) the probability that a certain person will die between time t 
and t:\-At is 

p{t,t-\-At) = a{t)At-{-o{At) 

where a{t) is a nonnegative continuous function: 

(2) it is taken that the death of the given person (or his survival) 
during the time interval (4, 4) under consideration does not depend 
on what preceded 4; 


* Changing the reasoning, we can prove that the result obtained will be the 
same if we do not assume that the second condition of the problem Is fulfilled. 



Sec. 10. Examples 


65 


(3) the probability of death at the time of birth is zero. 
Proceeding from these assumptions, find the probability of death 
of person A before he reaches the age of t 
Solution. We denote by n(f) the probability that A will live to 
the age t and we compute n(/+A0. From the original assumptions 
we obviously have the equation 

jt (/+A0=in: (^) jt (^4-A^; f) 


where f) signifies the probability of living to the age Z+A^ 

if A has already reached age t. In accordance with the first and second 
assumptions 

3 x(/+A^; t-{-At)==l—a(t)At—o{At) 

Therefore 

n (/+A^)=ji (0 [l~a (0 A^—o(A0l 

From this we find that ji(0 satisfies the following differential 
equation: 

^ - a(t)n{t) 


Taking into account the third condition of the problem, the solu¬ 
tion of this equation will be the function 

t 

—J* a (z) dz 

n(t)=^e ® 

Thus, the probability of dying before the age of t is equal to 

t 

“ J a (2) dz 

1 —JT {t)= 1 —e 0 

Mortality tables for adults are often compiled on the basis of 
Makeham’s formula, according to which ' 

where the constants a, p, y are positive. * The derivation of this 
formula is based on the assumption that a grown person can die 
from causes that have nothing to do with age and from causes 
depending on age, the probability of death increasing with age in 
a geometrical progression. Given that supplementary assumption. 


n{t) 


_ 1 ) 

6’ ^ 


Example 4. In modern nuclear physics, the intensity of a particle 
source is measured with Geiger-Muller counters. A particle entering 


* Their value is determined by the conditions under which the group of 
persons undergoing study live (social conditions, primarily). 



66 


Chap. 1. The Concept of Probability 


the counter generates a discharge in it that lasts time x, during which 
the counter does not record any particles entering the counter. Find 
the probability that the counter will count all particles entering it 
during time t if the following conditions are fulfilled: 

(1) the probability that during time t a total of k particles will 
enter the counter is independent of the number of particles that en¬ 
tered the counter prior to this time interval; 

(2) the probability that during the time interval from U to ^o+^i 
k particles entered the counter is given by the formula* 


^o + O — 


{atY e 
W 


k If ^ at 


where a is a positive constant; 

(3) T is a constant quantity. 

Solution. Denote by A (t) the event that all particles that ente¬ 
red the counter during time t were counted; by the event 

that during time t a total of k particles entered the counter. 

By virtue of the first condition of the problem, for 

P{^(/ + AO} = 

= P {t)\ P {B, (At)} -f P {A P {B, (T)} P {B, (At)} + 0 {At) 

and for 

P{A{t + At)} =P{A it)} P {Bo (A^} + P {Bo it)} P (B, {At)} + o {At) 

For the sake of brevity, put jt(/) = P{i4(^)}; then on the basis 
of the second and third conditions of the problem, for O^^^x, 

3t (/ + A^) = (0 6“"^* H- e'~^^^aAte~^^ + o {At) 
and for t^T 

ji (/ -f AO = ^ {t) + Ji {t —x) aAte'~^'^ + o {At) 

By passing to the limit as A^—>0 we find that for 0^/^x we 
have the equation 

= + (4) 


and for ^^x, the equation 

^- a[n(t)-n(t—z)e-‘‘'] (5) 

* It will become clear later on why in this example and in Example 4 of 
the previous section we considered that 



Exercises 


67 


From equation (4) we find that when 

JX = (c + fl/) 


From the condition 


ji(0)= 1 


we determine the constant c. Finally, when O^/^x, 

= (1-f fl/) (6) 

For x^^ ^2x, the probability ji(/) is determined from the equation 

= —a [n {t)~e~^ (1 —x)) a"®''] = 

= —a [jt (t) — e~°^ (1 -{-a(t —x))] 

The solution of this equation gives us 

II (0 = (c, + a/+ 2^^^) 

The constant may be found from the fact that according to (6) 

jt(x) = a“®^(l +ax) 

Thus, Ci=l and for x^^^2x 

11(0 = 6-“' [l+a<+ ^~^^- ] 


By the method of complete induction it may be proved that for 
(n—1) T^t^tiT, the following equation is valid: 




k=0 


EXERCISES 

A, B, C are random events. 

1. What meaning do the following equations have: 

(a) ABC^A; 

(b) A + B + C = A7 

2. Simplify the expressions 
(a) (A + B){B + Q; 

(h) iA+B){A-^B);_ 

(c) (A+B)iA-^B)iA+B). 



68 


Chap. 1. The Concept of Probability 


3. Prove the equations: 

(a) W~ A + B; 

(b) A-\-B = AB; 

(c) Ai-\- Az-\- ... A n — ^lA2.. .An, 

(d) AxA^^ • -An— Ai-\- A2-\- . An 

4. A four-volume work is on a shelf in random order. What is the probabi¬ 
lity that the volumes stand in the proper order from left to right or from right 
to left? 

5. The numbers 1, 2, 3, 4, 5 are written on five cards. Three cards are drawn 
in succession and at random from the deck; the resulting digits are written from 
left to right. What is the probability that the resulting three-digit number will 
be even? 

6. There are M defective items in a lot consisting of N items. From this lot 
we select n (n < N) items at random. What is the probability that there will 
be m defective items (/n^M) among-them? 

7. A quality control inspector examines items in a lot consisting of m items 
of first grade and n second-grade items. An inspection of the first b items chosen 
at random from the lot showed that they are all of grade two (6 < m). What is 
the probability that of the next two randomly chosen unchecked items at least 
one will also be of grade two? 

8. Using probabilistic arguments, prove the identity (A > a): 

. , A—a . {A—a){A—a —1> . , (A—a)...2-1 _ A 

—l"^ (A —1)(A —2) —l)...(fi-f-l)a~~ 

Hint. An um has A balls, oi which a are white. The balls are drawn at ran¬ 
dom without replacement. Find the probability that sooner or later a white ball 
will be encountered. 

9. Draw one ba 1 after another in succession from a box containing m white 
balls and n black balls (m > n). What is the probability that there will come 
a time when the number of .selected black balls will be equal to the number of 
white ones drawn? 

10. A person wrote letters to n addressees, one letter in each envelope, and 
then, at random, wrote one of the n addresses on each envelope. What is the 
probability that at least one of the letters reached Its destination? 

11. An urn has n tickets with numbers from 1 to n. The tickets are drawn 
at random, one at a time (without replacement). What is the probability that 
at least in one selection the number of the extracted ticket will coincide with 
the number of the trial? 

12. From an um containing n white balls and n black ones select at random 

an even number of balls (all the different ways of drawing an even number (k 

balls are considered equally probable, irrespective of their number). Find the 
probability that there will be the same number of black and white balls among 
them. 

13. paradox of de Mere. What is more probable: to get one ace with 

four dice, or to get one double ace in 24 throws of two dice? 

14. Three points are thrown at random on a segment (0, a). Find the proba¬ 

bility that a triangle can be constructed out of line-s^ments equal to distances 
from point 0 to the points of fall. 

15. A rod of length I is broken at two randomly chosen points. What is the 
probability that the pieces can be used to build a triangle? 

16. A point is dropped at random onto line-s^ment AB of length a. Another 
point is dropped at random on a line-segment BC of length b. What is the 
probability that a triangle can be built from the lines: (1) from point A to the 



Exercises 


69 


first point; (2) between the two points that were dropped; (3) from the second 
dropped point to point C? 

17. A total of N points are dropped at random and independently of one 
another into a sphere of radius R. 

(a) What is the probability .that the distance from the centre to the nearest 
point will be at least r? 

(b) What does the probability found in (a) approach if R —► oo and 

N 4 ,, 

/?3 —^ 3 

Note. The problem is taken from stellar astronomy: in the 


sun, 0.0063 if R is measured in parsecs. 


18. The events A^, A 


s» 


vicinity of the 
are independent; = Find the pro¬ 


bability of: 

(a) the occurrence of at least one of these events; 

(b) the nonoccurrence of all these events; 

(c) the occurrence of exactly one event (it is immaterial which). 

19. Prove that if events A and B are mutually exclusive, P {A) > 0 and 
P (B) > 0, then events A and B are dependent. 


20. Let A I, A 


^ 2 » 


n 


,*=1 I «=i 


A„ be random events. Prove the formula 
2 P<.AiA,)+ 

I < ( < / < n 

Employing this formula, solve Problems 10 and 11. 

21. The probability that a molecule that collided with another molecule at 

time / = 0 and that did not experience any other collisions up to time t will 
experience a collision during the time interval between t and is equal to 

Find the probability that the time of free motion (the time bet¬ 
ween two successive collisions) will be greater than t. 

22. Assuming that in the multiplication of bacteria by fission (division into 
two bacteria) the probability of a bacterium dividing during a time interval At is 
equal to aAt-\-o{Ai) and is not dependent on the number of previous divisions 
or on the number of bacteria present, find the probability that if at time 0 

there was one bacterium there will be i bacteria at time i* 



Sequences of Independent Trials 


Sec. 11. Independent Trials. Bernoulli's Formulas 

In this chapter we take up the study of basic regularities involved 
in one of the most important schemes of probability theory, that of 
a sequence of independent trials. To this notion we ascribe the fol¬ 
lowing meaning. 

By a trial we mean the realization of a specific set of conditions 
that can give rise to an elementary event of a space U of elementary 
events (sample space). The mathematical model of a sequence of 7 ^ 
trials is a new sample space Un consisting of points (eu ^ 2 , ..., ^n) 
where et is an arbitrary point of the space U corresponding to a trial 
with the number i. 

Suppose the trial consists in tossing a die. The space of elementary 
states consists of 6 points. The space U 3 which corresponds to three 
trials consists of 216 points (^ 1 , ^ 2 , ^ 3 )- 

Suppose a trial is regarded as a check of the duration of faultless 
operation of a semiconductor device at a given voltage. The sample 
space consists of the set of points of the half-line 0 ^e<Zoo. The space 
Un consists of the set of points (^i, e^, ..., en) whose coordinates as¬ 
sume nonnegative values equal to the durations of faultless operation 
of the devices numbered 1 , 2 , ..., /z, respectively. 

Let us assume that for the sth trial, the space U is partitioned 
into k mutually exclusive random events ...» that 

is, we assume that 

Af -h + ... -f = i/, Al^^Af = V 

{l¥=j\ i, /=1,2, s=l, 2, ..., n). Event will be 

called the fth outcome in the sth trial. We denote the probability 
of the iih outcome in the sth trial by = 

We denote by Ai^K . .A\l^ an event consisting of all the points 
(^ 1 , ^ 8 , ..., e„) of space (/„ for which e^ 6 A\l\ e^^ A{^,\ . • •, € A["K 



Sec. 11. independent Trials. BernoullTs Formulas 


71 


If in the space the equation 


v{a?:az\..a\:)=py:pz 


(n)l 


,n)«(2) 


'Ph 


is valid for any h, ta, .. .<Zin^k), the trials are 

termed independent * 

In the future we shall confine ourselves to the case where the pro¬ 
babilities of events i4‘f do not depend on the number of the trial; 
then we denote {t=l, 2 , since the outcomes 

are mutually exclusive and exhaustive, it is obvious that we have 
This scheme was first considered by James Bernoulli in the 
highly important special case of k—2. The case of k=2 is therefore 
known as the Bernoulli scheme. Ordinarily, in the Bernoulli scheme, 

Pi=P, p^=\—p=q. 

The definition of independent trials gives the following result: 
Theorem. If n given trials are independent^ then any m of them are 
also independent. 

For the sake of simplicity, we confine ourselves to the case of m— 
=«— 1 , since there are no difficulties in passing to the general case. 
Indeed, we have the obvious equality 


S Ar^AYMY,K..A\::l> 

/-I 


from which it follows that 


n-l 


n-l 


p { ay> ... . 41 ::."} =n p { ayj} 2 p {AT\ =n p < 

s=l /=! s=l 


By definition this means that the first n —1 trials are independent. 

It is easy to prove the following theorem, which elucidates the 
conditions of independence of trials. 

Theorem. For n trials to be independent it is necessary and suf¬ 
ficient to satisfy the conditions 

V{AflA^‘Y---A'-t) = ^{A^') 

for any group of numbers t, 4 , ..., im (1 ^ ti» • • •» im^ri) and any 

iTl, Qy Qh • • • » Qm (1 AZ, 1 Qy Qiy . . . , 

We shall not take up the proof of this proposition here, partly 
because its practical verification involves great difficulties. 


* It is possible to ascribe a broader meaning to the scheme of a sequence of 
independent trials and consider that the number of outcomes and the probabili¬ 
ties of these outcomes depend on the trial number. These more general construc¬ 
tions will not be considered. 



72 


Chap, 2. Sequences of Independent Trials 


A detailed investigation of such sequences of trials deserves the 
fullest attention both because of their immediate value in probability 
theory and in applications and ^Iso by virtue of the possibility, re¬ 
vealed in the process of the development of probability theory, of 
generalizing the regularities first discovered in studies of the scheme 
of a sequence of independent trials, in particular the Bernoulli scheme. 
Many of the facts elicited in this special scheme were later to serve 
as a guide in the study of more sophisticated schemes. This remark 
refers both to the past and to the modern dervelopment of probability 
theory, which will become evident later on in discussing examples 
of the law of large numbers and the DeMoivre-Laplace theorem. 

The most elementary problem involving a scheme of independent 
trials consists in determining the probability Pn(fn) that in n trials 
an event A will occur m times, and that in the remaining n — m trials 

the contrary event A will occur. 

We first find the probability that events occur in m specific 
trials (for instance, in trials with the numbers Si, Sg, ^ Sm) and do 
not occur in the remaining n — m trials. By the multiplication theorem 
for independent events, this probability is 


By the theorem of addition of probabilities, the desired probabi¬ 
lity P„ (m) is equal to the sum of the above computed probabi lities 
for all the different modes m of occurrence of the event and n—m 
nonoccurrences from among n trials. Combinatorial theory states 

that the number of such ways is consequently, 

the probability sought for is 

Pn{m) = C^p”^q^-”^ ( 1 ) 

Since all possible mutually exclusive outcomes of n trials con¬ 
sist in the occurrence of event zero times, 1 time, 2 times, ..n 
times, it is clear that 


SP„(m)=l 

mssO 

This relationship can also be derived, without probabilistic reason¬ 
ing, from the equation 

il P„(m) = (p+<7)'’=1"=1 

ms=0 

It will readily be seen that the probability P„(/n) is equal to the 
coefficient of in the binomial expansion of (q^pxY in powers of 



Sec. 11, Independent Trials. Bernoulli’s Formulas 


73 


x; by virtue of this property the collection of probabilities P„(m) 
is called the law of binomial probability distribution. 

By modifying our reasoning somewhat, the reader will easily see 
that if, in each of the trials, one of k mutually exclusive events Ai 
can occur and the probability of occurrence of event Ai in each trial 
is Pi, the probability of occurrence of event Ai, mi times in the course 
of n trials, of event A^, times, and of event ^4^, ... + 

+m/i=n), ..., mft times is 




rtf 


/rtj! /rtal ... mf^\ 


Pi ‘P2 . .-Pk 


mit 



It is also easy to see that this probability is the coefficient of 
x’T'jic?*... Xfe ^ in the expansion of the polynomial “f 

"h • • •A-Pk^kf Iri powers of x. 

Having in view the formulation of general problems involving 
the independent-trial scheme, we shall consider some numerical 
examples. We will not work out the desired probabilities to the end 
but will leave them until convenient methods have been prepared. 

Example 1. There are two vessels A and B, each with a volume 
of 1 dm®. Each contains 2.7x10®® molecules of gas. The vessels are 
brought into contact so that there is a free exchange of molecules 
between them. What is the probability that after 24 hours one of 
the vessels will have at least one ten-thousand njillionth part more 
molecules than the other? 

For each molecule, the probability of being in one or the other 
vessel 24 hours later is the same and is 6ne half. Thus, it is as if 
5.4x10®® trials were performed, for each of which the probability 
of being in vessel A is equal to Vg. Let p be the number of molecules 
that go to vessel A and, hence, 5.4x 10®® — p is the number of mole¬ 
cules that go to vessel B. We have to determine the probability that 

lli-(5.4 X 10«—n)| > =-5.4 X 10‘> 


in other words, we must find the probability 

p = P {I ^—2.7 x 10®® 1 > 2.7 X 10^®} 

By the addition theorem 

where the sum is extended to those values of m for which 

|m—2.7x10®® I >2.7 xlO^® 

Example 2. The probability .that an item of a certain kind of 
production will be defective is equal to 0.005. What is the prdba- 



74 


Chap. 2. Sequences of Independent Trials 


bility that out of 10,000 randomly chosen items there will be (a) 
exactly 40, (b) no more than 70 defective ones? 

In our example, 10,000, p = 0.005. Therefore, by formula (1) 
we find 

(a) (0.995r-««« (0.005)*« 

The probability P{p^70} that the number of defective items 
will not turn out to be more than seventy is equal to the sum of pro¬ 
babilities of the number of defective items being equal to 0, 1,2, ..., 
70. Thus, 

70 

(b) P {|i < 70} = 2 («i) = 

m=0 

70 

= 2 (o.ggs)*"-"””-" (o.oos)” 

m=0 

The above examples demonstrate that a direct computation of 
the probabilities from formula (1) (and also from formula (1')) come 
up against formidable technical difficulties; the problem therefore 
arises of finding simple approximate formulas for the probabilities 
P„(m) and also for sums of the form 

2 Pnim) 

m=s 

for large values of n. These problems will be solved in Secs. 12 
and 13. We shall now attempt to establish elementary facts deal¬ 
ing with the behaviour of probabilities P„{m) for constant n. We 
begin with a study of P„{m) as a function of m. It is easy to 
compute that for 0 ^ m < n, 

P„ (m-|- 1) n—m p 

m-\-l ’ ~q 


whence it follows that 


P„(m+l)>P„(m) 

if (n—m)p >(/n+1)^, that is, if np — q>m\ 

P„(m+l) = P„(m) 
if m = np —q, and, finally. 


P„(m+1)< P„(m) 

if m > np — q. 

We see that the probability P„(m) first increases with increasing 
m and then reaches a maximum, after which it diminishes as m con¬ 
tinues to increase. If np — q is an integer, the probability P„(m) 



Sec. 11. Independent Trials. Bernoulli's Formulas 


75 


assumes maximal value for two values of m, namely for mo=np — q 
andm^^np — q-\-1=np+p. Bui if np — q is not an integer, the ma¬ 
ximal value is attained by the probability Pn{fn) for m=mo, which 

is equal to the least whole number greater than mo. The number mo 
is called the most probable value of p,. We have seen that if np — q 
is an integer, then |li has two most probable values: and mo=mo+ 1. 

We note that if np —^7 < 0, then 

P„(0)>P„(1)> ... >PAn) 
and if np —^ = 0, then 

^«(0) = P„(l)>P„(2)>...>P„(n) 

Later on we will see that for large values of n all the proba¬ 
bilities P„ (m) become close to zero, but only for m close to the 
most probable value of p are the probabilities P„{m) noticeably 
different from zero. We will prove this later on; for the present we 
illustrate what has been said with a numerical example. 

Example 3. Let n = 50, P — 

There are two most probable values: m^^np —<7=16 and 
mo+ 1 = 17. 

The values of the probabilities P„ (m) are given in Table 5 to 
four decimal places. 


TABLE 5 


m 

Pn (m) 

m 

Pn (m) 

m 

Pn («) 

' 

< 5 

0.0000 

13 

0.0679 

23 


5 

0.0001 

14 

0.0879 

24 


6 

0.0004 

15 

0.1077 

25 


7 

0.0012 

16 

0.1178 

26 


8 

0,0033 

17 

0.1178 

27 


9 

0.0077 

18 

0.1080 

28 

0.0005 

10 


19 

0.0910 

29 

0.0002 

11 

mg^m 

20 

■MB 

30 

0.0001 

12 

0.0470 

21 


> 30 

0.0000 



22 

0.0332 




















76 


Chap. 2. Sequences of Independent Trials 


Sec. 12. The Local Limit Theorem 


When we considered the numerical examples of the previous section, 
we came to the conclusion that for large values of n and m, calculation 
of the probabilities Pnim) from formula (1), Sec. 11, involves consi* 
derable difficulties. The necessity arises of having asymptotic for¬ 
mulas that permit calculating these probabilities with a sufficient 
degree of accuracy. A formula of this kind was first found by DeMoiv- 
re in 1730 for the special case of the Bernoulli scheme when p=q=^Uy 
and then was generalized by Laplace to the case of arbitrary p diffe¬ 
rent from 0 and 1. 

This formula became known as the local Laplace theorem; in order 
to restore historical justice we shall call it the local theorem of 
DeM oivre-Laplace. 

We introduce the notation 


m —np 


( 1 ) 


It is clear that x depends both on n and p and on m. 


Local Theorem of DeMoivre-Laplace. If the probability of occur¬ 
rence of some event A in n independent trials is constant and is 
equal to p(0<p<. 1), then the probability P„(m) that in each of 
the trials event A will occur exactly m times satisfies the relation 

--.-x (n— OO) (2) 


uniformly in all m for which x lies in some finite interval. 

Proof. The proof that we give is based on Stirling’s formula 
(which is familiar from the course of analysis): 


s! = V 2jis • 


in which the remainder exponent 6, satisfies the inequality 



(3) 


We note that formula (1) may be written differently: 

m = np -f X Vnpq (!') 

whence it follows that 

ft—m = nq — xVn^ (T) 

The last two equations permit us to conclude that if x remains 
bounded by certain constants a and b, then both m and n—m tend 
to infinity as n oo. 



Sec. 12. The Local Limit Theorem 


77 


Employment of Stirling’s formula gives us 


n\ 


r 


n 


unmnn — m 




^ni^) ml (n — m)l^ ^ 2nm(n — m.) {n — m)”-'®'" 

where 0 = 0„—0„—By virtue of estimate (3) we have 


(4) 




i 


n — m 


If a^x^b, then the corresponding values of m and/z—m satisfy 
the inequalities 


m'^np-i-a]/^ npq — npy 
n — m^nq — b\/^npq = nq 



and, hence, for all indicated values of m and n—m we have the 
estimate 


|e|< 


1 

I2n 



(5) 


This shows us that no matter what the interval (a, b), in this 
interval the quantity 0 tends to zero uniformly in x as n-^oo, 
and consequently the factors® under the same conditions uniformly 
approaches unity. 

We now consider the quantity 


log4„ 


log = log 


np^'^ 

m 


+ log ( 

* \n — rn 


n - m 


-m log 


m 


np 


■{n—m) log 


n — m. 


nq 


/ '~ Q 


—{np + xV npq) log { 1 + x f 

\ ' 

inq—xVm)\og{\—x y 


Within the conditions of the theorem, the quantities ^ “ 

and X ]/"^ may be made arbitrarily small for sufficiently large 
n and so we can take advantage _o^f expanding the functions 

iog^l-f X j/"and \og(^l—x |Xin a power series. Con- 



78 


Chap. 2. Sequences of Independent Trials 


fining ourselves to the first two terms, we find 


The estimates of the remainder terms are uniform in any finite 
interval of variation of x. Thus 


logA„ = — {np + xVnpq) [■»: 
— (nq — x\rnpq) [ — x 


nq 



Hence, the following relation holds uniformly in x in any finite 
interval a^x^b: 

X* 


A,:e 2-^1 


Further, we have 

V min — m) r 


(«-m) ^ (l+* / /i) 


( 6 ) 


(7) 


Under the conditions of the theorem, the second factor on the right- 
hand side of this equation tends to unity as n-^oo and does so uniformly 
in each finite interval of variation of x. 

It will readily be seen that relations (5), (6) and (7) prove our theo¬ 
rem. 

Now we can complete our computations in the examples of the 
preceding section. 


Example. In Example 2, Sec. 11, we had to determine Pn{m) for 
n= 10,000, m=40, /?=0.005. From the theorem just proved we have 


For our example 


Pn(m) 


2Knpq 



2 


\/'npq = V 10,000 X 0.005 X 0.995 = K 49.75 


m—np 


— 1.42 


7.05 



Sec. 12. The Local Limit Theorem 


79 


Consequently, 


The function 


is tabulated; an abbreviated table of the values of this function 
is given at the end of the book (see the Appendix). From this 
table we find 

0.0206 

Exact computations without using the DeMoivre-Laplace theorem 
give 

0.0197 

To illustrate the nature of the approximations given by the 
DeMoivre-Laplace theorem and also for a geometric explanation of 
the analytical transformations carried out in its proof, we shall 
consider a numerical example. 


TABLE 6 
« = 4 


m 

0 

1 

2 

3 

4 

P„ (m) 

0.4096 

0.4096 

0.1536 

0.0256 

0.0016 

* 

— 1.00 

0.25 

1.50 

2.75 

4.00 

V npq Pn (/”) 

0.3277 

0.3277 

0,1229 

0.0205 

0.0013 

(p U) 

0.2420 

0.3867 

0.1295 

0.0091 

0.0001 


1 . 42 * 






7.05 Y 2ix 


•P W = 


Y 2n 


2 


Let the probability p be equal to 0.2. Tables 6 to 9 give the 

values of m, x = , the probabilities P„(m), the quantities 

V npq 

npq P„ (m), and also the function cp(x) = ■■ ^ to the fourth 

























80 


Chap. 2. Sequences of Independent Trials 


decimal place for the number of trials, respectively, ^ = 4, 25, 100, 
and 400. In Fig. 8 the ordinates depict the values of the proba¬ 
bilities P„(m) for various integral values of the abscissa m. It 
will be seen that P„(m) uniformly falls off with increasing n. 




mo 

OJO 

0.10 


n’=4 






• • •# 

- X • ,** n^no *•, 

1.x—*,2!-j-2a... 


10 


20 


30 










63 


75 


85 


95 m 


Fig. 8 


So that in the figure the points [m, P„(m)] for the values of n 
under consideration should not merge with the x-axis, we choose 
radically different scales for the coordinate axes. 

TABLE 7 
n = 25 


m 

X 

Pn (m) 

ynpq Pn (m) 


0 

—2.5 

0.0037 

0.0075 

0.0175 

1 

—2.0 

0.0236 

0.0472 

0.0540 

2 

-1.5 

0.0708 

0.1417 

0.1295 

3 

—1.0 

0.1358 

0.2715 

0.2420 

4 

—0.5 

0.1867 

0.3734 

0.3521 

5 

0.0 

0.1960 

0.3920 

0.3989 

6 

0.5 

0.1633 

0.3267 

0.3521 

7 

1.0 

0.1108 

0.2217 

0.2420 

8 

1.5 

0.0623 

0.1247 

0.1295 

9 

2.0 

0.0294 

0.0589 

0.0540 

10 

2.5 

0.0118 

0.0236 

0.0175 

li 

3.0 

0.0040 

0.0080 

0.0044 

12 

3.5 

0.0012 

0.0023 

0.0009 

13 


0.0003 

0.0006 

O.OOOl 

14 


O.'OOOO 

0.0000 

0.0000 

> 14 


0.0000 

0.0000 

0.0000 






Sec, 12. The Local Limit Theorem 


ai 


G)nsideration of abscissas 


m —tip 

'"“ 7 ^ 


and ordinates (m) — 


= )/npq P„ (m) in place of abscissas m and ordinates P„ (m) signifies: 

(1) a translation of the origin to the point {«/?, 0) located near 
the abscissa corresponding to the maximal ordinate P„{m)\ 

(2) an increase in the scale unit along the x-axis by a factor 

VfipQ (in other words, compressing the figure along the x-axis 
Ynpq times); 



(3) a decrease in the scale unit along the |/-axis by a factor 
V npq (in other words, expanding the figure along the ^-axis 
Ynpq times). 

Figure 9 (a, b, c) shows: the curve and the points 

[m, Pn(m)], i.e. the points [x„t ynim)] transformed in the way 


6^1986 




82 


Chap. 2. Sequences of Independent Trials 


just described. We see that already for n —25 the points [x„, 
y„(m)] merge in the graph with the corresponding points of the 
curve y = (^(x). This coincidence becomes still better for values of 
n greater than 25. 


TABLE 8 
n= 100 


m 

X 

Pn ("*) 

^npq Pn(m) 

<p (*) 

8 

— 3.00 

0.0006 

0.0023 

0.0044 

9 

- 2.75 

0.0015 

0.0059 

0.0091 

10 

— 2.50 

0,0034 

0.0134 

0.0175 

11 

— 2.25 

0.0069 

0.0275 

0.0317 

12 

— 2.00 

0.0127 

0.0510 

0.0540 

13 

- 1.75 

0.0216 

0.0863 

0.0862 

14 

- 1.50 

0.0335 

0.1341 

0.1295 

15 

— 1.25 

0.0481 

0.1923 

0.1826 

16 

— 1.00 

0.0638 

0.2553 

0.2420 

17 

- 0.75 

0.0788 

0.3154 

0.3011 

18 

— 0.50 

0.0909 

0.3636 

0.3521 

19 

— 0.25 

0.0981 

0.3923 

0.3867 

20 

0.00 

0.0993 

0.3972 

0.3989 

21 

0.25 

0.0946 

0.3783 

0.3867 

22 

0.50 

0.0849 

0.3396 

0.3521 

23 

0.75 

0.0720 

0.2879 

0.3011 

24 

1.00 

0.0577 

0.2309 

0.2420 

25 

1.25 

0.0439 

0.1755 

0.1826 

26 

1.50 

0.0316 

0.1266 

0.1295 

27 

1.75 

0.0217 

0.0867 

0.0862 

28 

2.00 

0.0141 

0.0565 

0,0540 

29 

2.25 

0.0088 

0.0351 

0.0317 

30 

2.50 

0.0052 

0.0208 

0.0175 

31 

2.75 

0.0029 

0.0117 

0.0091 

32 

3.00 

0.0016 

0.0063 

0.0044 


To get a clear-cut idea of the extent to which one can use 
the asymptotic formula of DeMoivre-Laplace for finite n*, i.e., 
to replace the binomial law in determining probabilities P„{m) 
by the function y = (p{x), we give the following example. For 


* Very precise estimates of the remainder term are given In $. N. Bernstein’s 
paper “Returning to the Question of the Accuracy of the Limit Formula of 
Laplace”, Izv. Akad. nauk S.S.S.R., Vol. 7, t943 (In Russian). 


Sec. 12. The Local Limit Theorem 


TABLE 9 
n = 400 


m 

X 

Pn (fn) 

ynpq Pn (m) 

<P (X) 

56 

—3.000 

0.0004 

0.0034 

0.0044 

57 

—2.875 

0.0006 

0.0051 

0.0064 

58 

—2.750 

0.0009 

0.0076 

0.0091 

59 

—2.625 

0.0014 

0.0104 

0.0127 

60 

—2.500 

0.0019 

0.0156 

0.0175 

61 

—2.375 

0.0027 

0.0218 

0.0238 

62 

—2.250 

0.0037 

0.0298 

0.0317 

63 

—2.125 

0.0050 

0.0399 

0.0417 

64 

—2.000 

0.0066 

0.0525 

0.0540 

65 

—1.875 

0.0089 

0.0679 

0.0684 

66 

—1.750 

0.0108 

0.0862 

0.0862 

67 

—1.625 

0.0134 

0.1075 

0.1065 

68 

—1.500 

0.0164 

0.1316 

0.1295 

69 

—1.375 

0.0198 

0.1583 

0.1550 

70 

—1.250 

0.0234 

0.1871 

0.1827 

71 

—1.125 

0.0271 

0.2175 

0.2119 

72 

—1.000 

0.0310 

0.2483 

0.2420 

73 

—0.875 

0.0349 

0.2789 

0.2721 

74 

—0.750 

0.0385 

0.3081 

0.3011 

75 

—0.625 

0.0419 

0.3317 

0.3282 

76 

—0.500 

0.0447 

0.3580 

0.3521 

77 

—0.375 

0.0471 

0.3766 

0.3719 

78 

—0.250 

0.0487 

0 3919 

0.3867 

79 

—0.125 

0.0497 

0.3973 

0.3957 

80 

0.000 

0.0498 

0.3985 

0.3989 

81 

0.125 

0.0492 

0.3956 

0.3957 

82 

0.250 

0.0478 

0.3828 

0.3867 

83 

0.375 

0.0458 

0.3666 

0.3719 

84 

0.500 

0.0432 

0.3459 

0.3521 

85 

0.625 

0.0402 

0.3215 

0.3282 

86 

0.750 

0.0368 

0.2944 

0.3011 

87 

0.875 

0.0332 

0.2656 

0.2721 

88 

1.000 

0.0295 

0.2362 

0.2420 

89 

1.125 

0.0259 

0,2070 

0,2119 

90 

1.250 

0.0223 

0.1788 

0.1826 

91 

1.375 

0.0190 

0.1523 

0.1550 

92 

1.500 

0.0160 

0.1279 

0.1295 







84 


Chap. 2. Sequences of Independent Trials 


Table 9 (continued) 


m 

X 

Pn (rn) 

Vnpq Pn (m) 

(P (X) 

93 

1.625 

0.0132 

0.1059 

0.1065 

94 

1.750 

0.0108 

0.0865 

0.0862 

95 

1.875 

0.0087 

0.0696 

0.0684 

96 

2.000 

0.0069 

0.0553 

0 0540 

97 

2.125 

0.0054 

0.0433 

0.0417 

98 

2.250 

0.0042 

0.0335 

0.0317 

99 

2.375 

0.0032 

0.0255 

0.0238 

100 

2.500 

0.0024 

0.0192 

0.0175 

101 

2.625 

0.0018 

0.0142 

0.0127 

102 

2.750 

0.0013 

0.0105 

0.0091 

103 

2.875 

0.0009 

0.0075 

0.0064 

104 

3.000 

0.0008 

0.0054 

0.0044 


the sake of simplicity, consider the case p — q — ~ and take 

only those n for which it is possible to have 1; for instance, 
these can be n = 25, 100, 400, 1156. Namely for them x^^—\ 
when m= 15, 55, 210, 595. 

For the sake of brevity, put 


Pn (m) = Pn 

and 

.- g" ^ =Q„ 

yT 27inpq 

for p = q = ~ and = 

According to the local theorem of DeMoivre-Laplace, the ratio 

p 

r~ should tend to unity when n—*- oo. The calculation for the 
values of n given above yields 


TABLE 10 


n 

Pn 

Qn 

Pn~Qn 

Pn/ Qn 

25 

0.09742 

0.09679 

0.00063 

1.0065 

100 

0.04847 

0.04839 

0.00008 

1.0030 

400 

0.024207 

0.024194 

0.000013 

1.0004 

1156 

0.014236 

0.014234 

0.00(X102 

l.OOOl 











Sec. 13. The Integral Limii Theorem 


85 


Repeating literally all the reasoning in the proof of the local 
theorem of DeMoivre-Laplace, we can easily obtain the following 
multidimensional local theorem. Before formulating it we give 
the notations: 


q.= \—p. (/ = 1, 2, 

y _ mi—npi 

^ V npiqi 


. .,, k) 


The quantity depends not only on i (that is, on /?,•) but also 
on n and mf, however, to save space we do not introduce any 
new indices. 

Local Theorem. If the probabilities pu p^, •••»/?* of the occurrence, 
respectively, of events ^4^^^ in the sth trial do not 

depend on the number of the trial and are different from 0 and 
from l(0<p/< 1, f=l, 2, ..., k), then the probability 
m^, ..m^) that in n independent trictls the events {i = \ ,2,... ,k) 
will occur mi times (mi+m 2 +...+/n^ = n) satisfies the relation'^ 

k 

“T 2 

Vn^-^P„{m^, m^, ...,m*):- ^ -^l(n-^oo) 

(2ji) ® VP 1 P 2 . •. P* 

uniformly in all m/(i = l, 2, k) for which the Xi lie in arbit¬ 
rary finite intervals ai^Xf^bi. 


Sec. 13. The Integral Limit Theorem 

The local limit theorem just derived will be used to derive 
another limit relation of probability theory that is called the 
integral limit theorem. 

The Integral Theorem of DeMoivre-Laplace. If p is the number 
of occurrences of an event in n independent trials, in each of which 
the probability of the event is equal to p, and 0<p< I, then the 
following relation holds uniformly in a arid b (—00 ^ a ^ ^ + 00 ) 


* This limiting relation is written in homogeneous coordinates; the quanti- 

fe _ 

ties Xi are connected by the relation ^ V^Pi9i = ^* which is readily derived 

i = \ 

from the relations = ” independent 

variables, one of the arguments, for example Xf^, must be eliminated from 
formula (2). 



86 


Chap, 2, Sequences of Independent Trials 


as n 


oo: 


{ 


Pia< 


p — np 


<b 




^ dz—^0 


Proof. For the sake of brevity we introduce the notation 

P„(a, 6) = p|a<^^< fej. 


This probability is obviously equal to the sum extended 

to those values of m for which a^x^<b, where, as before, 
_m —np 

Now let us define the function = as follows: 


y == IL (x) = 


np 


0 for a: < ATo = —_ 

V npq 

0 for a: ^ a:„ + 

" V npq V npq 

VimPnirn) for (m = 0, 1, n) 


Obviously, the probability P„{m) is equal to the area bounded by 

the curve y — the A:-axis and the ordinates at points x = x„ 

and x = x^+j^, that is. 


■*IW+ ( 


P„{m) = Vl^P„{m){x„,.,—xJ=: J Jln(x)dx 




Whence it follows that the desired probability P„(a, b) is equal 
to the area between the curve i/ = II_„(A:), the x-axis and the ordi¬ 
nates at points Xm and a:-, where m and m are defined by the 
inequalities ~~ 


a^Xm<a-}- 


And so 


npq ’ 


b^x-<b + 


V npq 


X— 

m 


b 


X— 

m 




b)= J Jlnix)dx^lYln(x)dX -jr J UnM^X— J Iln{x)dx 

Xffl CL b Q 


Since the maximal value of the probability P„ (m) lies at the 

value = i A 2 -f l)p], the maximal value of H^ W falls in the 
interval 

r^^ niQ — np ^ mQ+\—np ^ 2 

npq npq npq 



Sec. 13. The Integral Limit Theorem 


87 


It is in this interval that the local theorem of DeMoivre-Laplace 
is operative and we can therefore conclude that for all sufficiently 
large values of n 

max n„ (^X 2 max = ]/I 
From this we first of all draw the conclusion that 


I X- 

m 


P« 


'-m 


J U.n(x)dx— J 


m 


< J max + 


m 


nnpq 


4-1 maxn«W<i*< i—b + x-+x„—a)^2 y 

a 

and that, consequently, 

lim p „=0 

n -* <xt 
b 

Thus, P„(a, b) differs from only by an infinitesimal. 

We first assume that a and b are finite numbers. On this 
assumption, in accord with the local theorem for a <x„<b. 


'm 


where a„{x„)-^0 uniformly in as n —oo. It is obvious that 
for the intermediate values of the argument as well, 

I 

YLn(x)=^y=e * [l+a„(x)] 

and lim max a„(x)==0. Indeed, for any m in the interval 

ft-^oo aK X <.b 

we have 

Il« W = n« {xj = -^e~[\ H-a„ (X)] 


v~^ 


where 


Since 


^ 


a„(x)=e " [a„(x„)+l] —1 

max(|a[, | 6 |) 


— Xm 


<|x|-lx—X,„l < 


V~^q 


2 



88 


Chap. 2. Sequences of Independent Trials 


it follows that 


lim max a„ (a;) = 0 

n -* a> a <.x < b 


Collecting together all the estimates, we obtain 

^ X* 

Pn(a, b) = -:^^e'~4x+R^ 

a 

where 


Since 


= j * a„{x)dx + p„ 


b 

Rn\< max |a„(Ar)|.-^f«"“c(x + p„ 

a<x<b y 2n J 

a 


it is clear from the foregoing that 

lim R„ = 0 


n -*■ <» 


The theorem is now proved on the particular assumption made 
in the course of the proof. We now have to get rid of this rest¬ 
riction. 

For this purpose, we first of all note that * 


2* 

1 r -4- 


V2n 


— fa ^ dz=l 

Jl J 


For this reason, for any e> 0 it is possible to choose an A so 
large that 

A 2 * 

‘ j e~dz > 1—4. 


V2 


V 2n 

-A 

— A 2 * 

l=r { e ^ dz 

2n J 
— 00 


CO 


-4=- f e * dz < 4 

v^2« J ^8 

A 


Also, in accordance with what has been proved, choose an n so 
large that for —A we will have 

b ~t 




<± 

^ 4 


* Here and henceforward, definite integrals without indicated limits are taken 
from —CO to +O 0 , 



Sec, 13. The Integral Limit Theorem 


89 


Then it is obvious that 

P(_oo, -A)+P(A, +oo)=l-P(-^, A)<\ 

Now let us prove that for any a and b (— oo^a^b^-{-oo) we 
will have 

Pnia,b)—^^e--dz<s 

a 

thus obviously completing the proof of the Laplace theorem. 

To do this we must separately examine different cases of the 
location of points a and b on the straight line relative to the in¬ 
terval {—A, A). For example, let us take the case a^— A, 
b'^ A (the others are left to the reader). 

In this case 

a ^ a — A A ' 

P^(a, b) = PAa, —A) + Pni-A, A) + PAA, b) 


Therefore 




—A) 


V 2n 


~A 

J 


e ^ dz 


+ 




PA-A, .4). 


V 2 


= f 

2n J 
-A 


A 22 


e ^ dz 




Pn(A,b, 


-A 2 * 


V 2 




231 J 






Pn( — OO, -.4) + “p== J ^ ^ dZ-f- 


— 00 


+ 


p.^-A.A)-^ j,-;* 

-A 


A-Pn 00) + 


00 2 * 

1 r rr 


A 


Let us now formulate the integral limit theorem in the general case 
of a scheme of a sequence of independent trials. As before, let pi {i= 
= 1,2, denote the number of occurrences of events A^f (s— 

= 1,2, .n) in ft successive trials. Depending on chance, the num- 



90 


Chap. 2. Sequences of Independent Trials 


bers >ii may assume only values 0, 1, 2, ..., /i, and since k outcomes 
are possible in each trial and these outcomes are mutually exclusive, 
the following equation must hold: 

M'l+P'a’f • • • (1) 

Let us now regard the quantities pi, pa, •••, Pa as the rectangular 
coordinates of a point in ^-dimensional Euclidean space. 

Here, the results of n trials will 
be depicted as a point with inte¬ 
gral coordinates not less than 
zero and not greater than n\ in 
future we shall call such points 
integral points. Equation (1) shows 
that the results of trials will not 
be depicted by arbitrary integral 
points in the hypercube O^Pi^n 
(i=l, 2, ..., k), but only by 
those that lie in the hyperplane 
(1). Figure 10 illustrates the posi¬ 
tion of possible results of trials 
Fip 10 in the hyperplane (1) for the case 

n—3, ^=3. 

Transform the coordinates by means of the formulas 

(f — 1, 2, . . ., kt \ P|«) 

In the new coordinates, the equation of the hyperplane (1) will be 
of the form 

k _ 

2 Xi V npiQi =0 ( 2 ) 

We also agree to call integral those points of the hyperplane (2) 
into which the integral points of hyperplane (1) were transformed. 

Denote by P„(G) the probability that as a result of n trials 
the numbers p,- (/=!, 2, k) of occurrences of each of the 
possible outcomes will be such that the point with coordinates 

Pi~npi 

V npiQi 

will fall inside the region G. 

We then have the following 

Theorem. If in a scheme of a sequence of independent trials there are 
k possible outcomes in each of the trials^ and the probability of each of 
the outcomes is independent of the number of the trial and is different 





Sec. 13. The Integral Limit Theorem 


91 


from 0 and from 1, then no matter what the region G of the hyper plane 
(2), for which the {k—\)-dimensional volume of its boundary is zero, 
the following relation holds uniformly in G as n-^oo: 

T — r g-T 2 "i** dv 

P„ (G) —'p' 2 PiU ] 

^ ~G 


where dv denotes the volume element of the region G and the integral 
extends over the region G. 

We expressed the theorem just formulated in a form in which all 
the variables Xu ... ,Xn play the same role. In the integral theorem 
of DeMoivre-Laplace, however, we preferred to carry out the reaso¬ 
ning only with the variable x=Xu violating the homogeneity of the 
variables Xi and x^. Geometrically, this meant that we did not regard 
the results of the trials themselves (integral points on the straight 
line Xi+a: 2=0) but their projections on the A:-axis. In similar fashion, 
by violating the homogeneity in the general case, we can consider 
integration not over the region G but over its projection G' on some 
coordinate hyperplane, say, on the plane .^^=0. Volume element 
dv' in the hyperplane Xk=0 is connected with volume element dv of 
hyperplane (2) by the relation 

dv' — dv cos q) 

where cp is the angle between the indicated hyperplanes. It is easy 
to calculate that 

cos q) = 

y 2 Pi^i 

In the coordinate hyperplane the volume element dv' =dx^dx^.., 
— dXf^_i, we therefore have the equation 


(Ik Tg-a = 


r - c 

I / (liQi (Ik-1 i ^ 2 Y I' ' 

G' 


dx j ... dx ^ ^ 2 


In the integrand we must replace by its expression in terms of 

Xi, Xg, . . •, Xfi^i. 


k~l 


Xt~ 


Vpk9k it 


K' p.giXi 



92 


Chap. 2. Segmtuxs of Independent Trials 


As a result of this substitution we have 


k~\ 


S = S 9/ (* + + 2 TIl 

i=l 1=1 ^ 


V PidiPj-qj 


Pk 


< i < /< k-i 

— Q (<^1, X^t • * •y Xj^_i) (3) 


Thus, the integral limit theorem may be formulated differently: 
In the conditions of the integral limit theorem, as n—^oo, 

(G) j/J ■ ■ ■■ dx,dx,... dx,., (4) 

^ G' 


The integral theorem of DeMoivre-Laplace is a special case of 
the theorem just proved: it is readily obtainable from formula (4). 
To do this, it is sufficient to note that in the Bernoulli scheme 

k = 2, p = Piy q = p^=\—p. 

For k = 3 formula (4) takes on the following form: 


G' 


2 ^ dx dx 


3 


where 

P3=l—Pl—Ps, 

Q (-ti. x^)=q, (1 + ^) +gs (1 + ^) ■»gi+2 - 

^3^{x\ + xl + 2/'^^x,x) 

A simple calculation shows that 

Pa “ 1 — Pi—P* = gi?! —PiPs* 

therefore. 


Q ( Xi , x ,)=-—^(4 + 4+2 


1 - 


P1P2 


Sec. 14. Applications of the Integral Theorem of 
DeMoivre-Laplace 


As a first application of the integral theorem of DeMoivre Lap¬ 
lace, we estimate the probability of the inequality 


£ 

n 


P 


< 8 
» 


* Indeed, since Pi-f-<7i=l and P 2 + <72=1, it follows that 1 — pi — P 2 = 
1 = i^i—Pa = (P 2 +< 72 )—P 2 (Pi + ^ 1 ) = —PiP2- 



Sec. 14. Applications of the Integral Theorem of DeMoivre-Laplace 


93 


where e>0 is a constant. 

We have: 

and, consequently, by virtue of the integral theorem of DeMoivre- 
Laplace 

And so, no matter what the constant 8>0, the probability of 
the inequality — p <8 tends to unity. 

This fact was first found by James Bernoulli. It is called the law 
of large numbers, or BernoulWs theorem. Bernoulli’s theorem and its 
numerous generalizations are some of the most important of the the¬ 
orems of probability theory. It is precisely via them that the theory 
contacts practice and in them that we see the fundamental success of 
the application of probability theory to diverse problems of natural 
science and technology. This will be treated in more detail in the chap¬ 
ter devoted to the law of large numbers, where we will prove the Ber¬ 
noulli theorem by a simpler method that differs both from the one 
just given and from Bernoulli’s. 

We now consider typical problems that lead to the DeMoivre- 
Laplace theorem. 

A total of n independent trials are carried out, for each of which 
the probability of occurrence of event A is p\ 

I. What is the probability that the frequency of occurrence of event 
A will deviate from the probability p by no more than a? This pro¬ 
bability is 



11. What is the'least number of trials that must be carried out 
so that, with a probability not less than p, the frequency should 
deviate from the probability by no more than a? We must deter¬ 
mine n from the inequality 



94 


Chap. 2. Sequences of Independent Trials 


We replace the probability on the left-hand side of the inequality 
approximately by an integral using the DeMoivre-Laplace theo¬ 
rem. For a determination of n this yields the inequality 



0 




III. For a given probability P and the number of trials n, it 
is required to determine the boundary of possible variations of 


ii 

n 



In other words, knowing p and n, it is necessary to 


find a, for which 




For determining a, 
equation 


the integral theorem of Laplace yields the 



2 dx=f^ 


The numerical solutions of all the problems we have considered 
involved evaluating the integral 

for any values of x and solving the inverse problem: from the value 
of the integral O (x) compute the appropriate value of the argument x. 
These calculations require special tables, since for OcxCoo the in¬ 
tegral (1) is not expressible in closed form in terms of elementary fun¬ 
ctions. Such tables have been compiled and are given at the end 
of the book (see the Appendix). 

Figure 11 gives a pictorial idea of the function Using the 

table of the values of the function O (x) it is also possible to eva¬ 
luate the integral (using the formula J {a, b)=(!>{b) — 0(a)) 

a 

The table of the function (I)(x) is compiled solely for positive 
x; for negative x, the function 0 (x) is found from the equation 


0(_x) = —0(x) 



Sec. 14. Applications of the Integral Theorem of DeMoivre-Laplace 


95 


We are now in a position to complete the solution of Examp¬ 
le 1 of Sec. 11. 


Example 1. In Example 1 of Sec. 11 we had to find the proba¬ 
bility 


f=2P{ti=m} 


where the sum is extended to those values of m for which 

|m—2.7x10231 >2.7x1012 



Fig. 11 


provided that the total number of trials n —5.4x1022 and p--^ . 

A' 

Since 


p=P 


{ I p—«p| 




2.7x1012 


y 5. 


4x 1022 X 


(*^"{^^2.33x10} 


by virtue of the Laplace theorem 


Since 


it follows that 


P 


2 

/■fix 



2.33x10 


£l 

2 dx 


* 1 00 

^ dx < — ^ xe ^ dx — ^^ 


£l 

2 


P<-^ -e-2.7XlOO JO-100 

l^2nxl0 

To get an idea of just how small this probability is, suppose a sphere 
of radius 6,000 km is filled with white sand, one grain of which is 



96 


Chap. 2. Sequences of Independent Trials 


black and of size 1 mm®. One grain of sand is taken at random from 
this mass, what is the probability that it will be black? 

It is easy to calculate that the volume of the 6,000-km-radius sphere 
is slightly less than 10®® mm® and, consequently, the probability of 
extracting a black grain of sand is somewhat greater than 10“®®. 


Example 2. In Example 2 of Sec. 11 we had to find the probabi¬ 
lity that the number of defective items would not exceed seventy if 
the probability for each item being defective was p=0.005 and the 
number of items was 10,000. From the theorem just proved, this pro¬ 
bability is 


P 


{H<70> = P{ 


50 ^ fx — np ^ 20 \ 

1^497^ y 49^ / 


= P 



\x — np 

Vim 



1 

2n 


2.84 


I " 


£l 

^ dz = 


= O (2.84)—0) (—7.09) = (D (2.84) -f- O (7.09) = 0.9975 


The tables do not contain values of the function 0 (jc) for x—7.09, 
so we replaced it with half that value, thus committing an error less 


than 10 ^®. 


Naturally, in the examples of this section and the previous one, as 
in any other problems relating to the determination of probabilities 
Pn{tn) for certain finite values of m and n by the asymptotic DeMo- 
ivre-Laplace formulas, it is required to estimate the error due to such 
a replacement. For a very long time, the DeMoivre-Laplace theorems 
were applied to the solution of similar problems without a satisfactory 
estimate of the remainder term. A purely empirical confidence deve¬ 
loped that for n of the order of several hundreds or more and for p 
not too close to 0 or 1, the DeMoivre-Laplace theorems give satisfa¬ 
ctory results. At present we have sufficiently good estimates of errors 
resulting from the use of the asymptotic DeMoivre-Laplace formula.* 
Let us also examine the generalization of Bernoulli’s theorem to 
the case of a general scheme of a sequence of independent trials. In 
each trial let there be k possible outcomes, the probability of each 
of which is equal, respectively, to pi, pa, ...»p* and let pi, Pa,,.., 
be the numbers of occurrences of each outcome in the sequence of n 
independent trials. We determine the probability of a simultaneous 
realizatidn of the inequalities 


Pi 

n 




Pa 

n 


—Pa <e 


a> 


n 


Pk < H 


that is, of the inequalities 


(2) 


Xj I < 8i 



I -^a I ^ ®a 



' ^ ^ Pk^k 


* See, lot example, the paper by S. N. Bernstein cited on p. 81, 



Sec. 15. Poisson's Theorem 


97 


Strictly speaking, the last of these inequalities is a corollary to 
the preceding ones, since in accordance with (2) of Sec. 13, the 
first k —1 of the inequalities (2) yield the estimate 


\Xk\ 


k-i 


EM. 

Pk^k 


Xi 


fe-i 

Sv 

{=1 


EM 

PkQk 


(3) 


According to (4) of Sec. 13, the probability of the first k—\ ine¬ 
qualities (2) and hence also of inequality (3) has as its limit, as 
n—>oo, the integral 

• • J .=* 


Sec. 15. Poisson's Theorem 

From the proof of the local theorem of DeMoivre-LapIace it may 
be noted that the asymptotic representation of the probability Pn(tn) 

1 

by means of the function a deteriorates the greater the pro¬ 

bability p differs from one half, that is, the smaller the values of p 
or q that have to be considered; this representation fails for ;7=0, q~\, 
and also for p—1, <7=0, However, a substantial range of problems 
necessitates finding probabilities P„(m) for just such small values 
of p.* So that the DeMoivre-Laplace theorem should in that case yield 
a result with only a slight error it is necessary that the number n of 
trials be very great. The problem therefore arises of finding an 
asymptotic formula specially adapted to the case of small p. Such 
a formula was found by Poisson. 

We consider a double sequence of events 

^11» 

■^21» ■^22> 

FEE 


'« 1 » 


'n 2 > 


'B3» 


'nrt» 


in which the events of one row are mutually independent and 
each has a probability p„ that depends solely on the number of 
the row. By we denote the number of events that actually 
occur in the nth row. 


* Or for small values of q as well it is obvious, however, that the problems 
of seeking asymptotic formulas for P„ (m) for small values of p and q reduce to 
one another. 





98 


Chap. 2. Sequences of Independent Trials 


Poisson’s Theorem. If >-0 as n—^oo, then 

= — Q (1) 

where 

an = npn 

Proof, t is obvious that 
P„ (m) = P {n„ = m} = CM (1 -PJ"’"’ = 

nl (“nYlx “nY''" ... 
ml (n — m)\ \ n J \ n ) 



Let m be fixed. We choose an arbitrary e > 0. Then it is pos¬ 
sible to choose A = A(e) so large that for a^ A we would have 


flOT -—a 8 

— p ^ ^ — 

ml ^2 


Let us first consider those numbers of n for which For 

these n, we have, from the inequality 1—x < O^x^l: 

m n-m 


^ e-an < 1- 
ml ^ ^2 


Therefore, for the indicated n, 




We now consider- those numbers of n for which an^A. Since 
lim = 0 for and for constant m, 


1-1 

n 


m — 1 


i-^r 

n 


it follows that by virtue of formula (2) for 


Pn(tn) 

and the proof is complete. 


m 

<8 

ml 



Sec. 15. Poisson*s Theorem 


99 


We note that the Poisson theorem is also valid when the prob¬ 
ability of the event A is zero in each trial. In this case, a„=0. 
We denote 

The probability distribution thus obtained is called Poisson's 
law or the Poisson distribution. 

It is easy to calculate that the quantities P [m) satisfy the equ¬ 
ality '^P {m)~\. Let us study the behaviour of P {m) as a fun- 

m 

ction of m. With this purpose in mind, consider the relation 

P{m) _ 

P (m— 1) m 

We see that if m'^a, then P(m)<P(m—1), but if m<a, then 
P(m)>P(/n—1); if, finally, m = a, then P(m)—P{m —1). From 
this we conclude that the quantity P(m) increases with increasing 
m from 0 to [a] and falls off with further increases in m. 
If a is an integer, then P (m) has two maximal values: at 
and at niQ = a —1. 

The following are examples. 


Example 1. The probability of hitting a target is 0.001 for each 
shot. Find the probability of hitting the target with two or more 
bullets if the number of shots is 5,000.* 

Taking each shot as a trial and hitting the target as an event, we 
can take advantage of Poisson’s theorem to compute the probability 
P{p„^2}. In our case, 

<3:„='ip=0-001 X 5,000=5 
The probability sought for is 

5,000 

PK>2}= S />„(/«) = 1-P„(0)-/’„(1) 

m=2 


By the Poisson theorem 


Therefore 


P„(0)«^-^ P„(l)«5g-^ 
P{pn^2}«l—«0.9596 


The probability P„(m) takes on a maximal value for m=4 and 
m=5. To four decimal places, these probabilities are equal to 

P(4)=P (5)^^0.1751 


* In the Great Patriotic War, the conditions of our problem were realized 
in small-arms fire against aircraft. An aircraft can be shot down only if hit in 
a vulnerable spot: motor, pilot, fuel tank, etc. The probability of hitting these 



100 


Chap. 2. Sequences of Independent Trials 


Using an exact formula, the computations yield (to the fourth deci¬ 
mal place) P6.ooo(0)=0-0067, PB,ooo(0=0-fe36 and, consequently, 

PK>2}=0.9597 

The error due to use of the asymptotic formula is less than 0.01 % 
of the value being computed. 

Example 2. A worker at a spinning mill attends several hundred 
spindles, each of which spins its own skein of yarn. In the process 
of winding, the yarn breaks due to nonuniformity of tension, uneven¬ 
ness and other causes at chance instants of time. It is important 
to know how frequently such breaks can occur under one or another 
set of conditions (grade of yarn, speed of spindles, etc.). 

Assuming that a worker attends 800 spindles and the probability 
of yarn breakage on each spindle during a certain interval of time t 
is 0.005, find the most probable number of breaks and the probability 
that during the time interval t there will be no more than 10 breaks. 
Since 

=n/7=0.005 X 800=4 

there vdll be two most probable numbers of breaks in the time 
interval t: 3 and 4. Their probabilities are 

^800 (3) = Psoo (4) = CfooX 0.005* X 0.995’®« 

Using Poisson’s formula we have 

48 

Psoo (3) = Psoo (4) «f X= 0.1954 

The exact value of Pgoo (3) = Pgio (4) = 0.I945. The probability that 
the number of breaks in a time interval t will not exceed 10 is 
equal to 

10 OD 

PK<io}= 2 f’soo ("!)=1-- 2 ^»oo('«) 

m=0 m=II 

By virtue of the Poisson theorem, 

^ Am 

(in = 0, 1 , 2, ...) 

and so 

P{n„<i0} = l~ 2 

m=ll 


vulnerable spots with a single shot is extremely small, but, as a rule, a whole 
unit fired at once and the total number of shots was considerable. The proba¬ 
bility of one or two bullets making a successful strike was then rather appreci¬ 
able. This was also found to be so in actual cases. 



Sec. 15. Poisson's Theorem 


101 


But 


4i» /411 412 4i3\ 412.14 

^ ni.'^Q ^ ^=0.00276 


m= 11 


11139 


On the other hand, 


y , f!.-. ■ !“„-4 fi , 14_ MV , 1 _ 

-^■ml ^111^ +J21® +131® I ‘+14+II 4 J + • • ■ ~ 

n=ll V. / J 


412 .24 

Tirl®“ = 0’00284 


Thus, 


0.99716 < P {ii„ < 10} < 0.99724 


Just as in the case when applying the DeMoivre-Laplace theorem, 
we have to estimate the error that results from replacing the exact 
formula for computing P„{m) by the asymptotic formula of Poisson. 
From the equation 


/ a \ « nln ( 

P„(0)==(l-^) V 


where 


=-exp|_„f 

k “ 2 

\ / 




we can readily find this estimate for the case m = 0. Indeed, since 
for arbitrary positive x 

0<l—e'^<x 

it follows that, no matter what the a„ and n were, 

ec 




Since 


00 ao 

V i. V <r 4-i. V — 

^ k \n I 3 ^\n J ~~ 

h—O \ / fc = S M 


fl® 3n—fl„ 




2n(/i—a„) 



102 


Chap. 2. Sequences of Independent Trials 


it follows that 


0 <^„< 


al 


2(n—a„) 


From the fact that /?„ is nonnegative, we conclude that when 
P„ (0) is replaced by we increase somewhat the probability P„(0). 


Sec. 16. An Illustration of the Scheme 
of Independent Trials 

By way of illustrating the use of the foregoing results in the natural 
sciences, we consider very schematically the problem of a random walk 
of a particle on a straight line. This problem may be regarded as the 
prototype of actual physical problems in the theory of diffusion. 
Brownian motion, and so forth. 

Imagine that at specific instants of time a particle, starting from 
the position x=0, experiences random impacts that displace it to 
the right or the left one unit of distance. Thus, each time, the particle 
is shifted one unit to the right or one to the left with a probability 

of Y’ As a result of n impacts the particle will have been displaced 

to a distance p. In this problem we clearly have to do with the Ber¬ 
noulli scheme in its pure form. It then follows that for each n and m 
we can calculate the probability that p=m; namely. 


m 4-/1 


P{p=/n} = ] Cn 


) if —n ^m^n 
0 if I ml > n 


For large values of n, as follows from the local theorem of DeMoivre- 
Laplace, 

V^2 -- 

P {\i = m] ^e (1) 


V 


nn 


We may regard this formula as follows. Suppose at an initial time 
there are a large number of particles with coordinate x=0. All these 
particles begin to move along a straight line independently of one 
another as a consequence of random impacts. Then, after n impacts, 
the portion of particles that has covered a distance m is given 
by formula (1). 

We of course consider idealized conditions of particle motion; 
actual molecules move under much more complicated conditions, 
but the overall result yields a correct qualitative picture of the pheno¬ 
menon. 

In physics, more involved examples of random walks are considered. 
We confine ourselves to just as schematic a consideration of the effects 
of: (1) a reflecting barrier; (2) an absorbing barrier. 

Imagine that at a distance of s units to the right of point x=0 
there is a reflecting barrier, such that a particle which at some time 



Sec. 16 . An Illustration of the Scheme of Independent Trials 


103 


hits the barrier is, upon the next impact, returned with probability 
one in the same direction from which it arrived. 

Figure 12 gives the reader a more pictorial view of a particle on a 
plane (x, t). The path of the particle will be depicted as a broken line. 
Each impact advances the particle one unit “upwards” and one unit 
to the right or the left (with a probability one half each time that 
a:<s). Now if x=s, then on the next impact the particle will be shifted 
one unit to the left. 

To compute the probability P {p=m} 
we do as follows: we mentally elimi¬ 
nate the barrier and allow the particle 
to move freely as if there were no 
barrier at all. Figure 12 shows such 
idealized paths that lead to points A 
and A', which are symmetric about the 
barrier. For an actual particle, moving 
with reflections, to reach A it is ne¬ 
cessary and sufficient for the particle 
moving in the idealized situation 
(without the reflecting barrier) to reach 
either A or A’. But the probability of 
getting to point A in the idealized 
situation is obviously equal to 


■ 

■ 

■ 


IIS 


E 

m 

■ 

■ 

■ 

lei 

■ 

IS 

a 

! 

■ 

■ 

■ 

Zk 

■ 

Zk 



■ 

■9 

a 

■ 

Zk 




■ 

IS 

■ 

Zk 

■ 





■ 

Zk 


■ 




1^1 

m 

■ 

■1 

■ 




m 

■1 

■ 

■1 

■1 





0 ms X 

Fig. 12 


V 2 



n 


In exactly the same way, the probability of getting to 
is equal (the abscissa of A' is 2s — m) to 


P {p = 2s— m} = 


_2]_fll 


point A^ 


The desired probability is thus 


P„(/n; s) = P {p = m} H-P {p =2s— m) 


Taking advantage of the local limit theorem of DeMoivre-Lap lace, 

(m; S) -7= 2/1 

V Jlrt t 

This is the famous formula of Brownian motion theory. It takes 
on a more symmetric form if the origin is placed at point x = s, 
and, hence, if we pass to a new coordinate z by the formula 
z = x—s. This substitution gives us 



104 


Chap. 2. Sequences of Independent Trials 


P„(Z=k)=P„{k + S, 2/. 2n j, 


We now consider the third schematic problem: an absorbing barrier 
is placed in the path of the particle at point x—s. A particle striking 
this barrier drops out of the motion. Obviously, in this example the 
probability of getting to point x=m(m<s) after n impacts will be 
less than Pn{m) (that is, less than the probability of getting to this 
point without the absorbing barrier); denote the desired probability 

by the symbol P„(m; s). 

To compute the probability Pn{tn; s) we again eliminate mentally 
the absorbing barrier and allow the particle to move freely along 

the straight line. If at some time the 
particle reaches x=s, at subsequent in¬ 
stants of time it will go to the right and 
left of the line x==s (Fig. 13) with the 
same probability. In exactly the same 
way, after getting to the straight linex=s, 
the particle can reach both point A (m, n) 
and point A'(2s— m, n) with the same 
probability. But the particle can reach 
k' only after first having reached the po¬ 
sition x=s; therefore, for any pathway 
leading to A' there is a path symmetric 
about the straight line x=s and leading 
to A; in exactly the same way, for every 
prohibited pathway (in actual motion) 
leading to A there is a path symmetric about x—s that leads to point 
A'. It will be noted that here we consider the symmetry of pathways 
only after hitting the straight line a:=s. The foregoing reasoning shows 
that when counting the number of favourable cases in actual motion 
we must eliminate from the paths leading to A in idealized motion 
the exact number of paths that lead to point A'. It obviously follows 
from this that 

P„{m; s)==P{|x = m}—P{p = 2s— m) 

By virtue of the local theorem of DeAloivre-Laplace we have 



s' 

Fig. 13 


(m; s)« 



(2s-OT)*'j 
2n J. 


EXERCISES 

1. A workman operates 12 machines of the same type. The probability that 
one machine will require his attention during a time interval of duration t is 
1/3. What is the probability that; 



Exercises 


105 


(a) during time t 4 machines will demand the attention of the workman; 

(b) the number of such demands during time t will lie between 3 and 6 
(including the boundaries)? 

2. A certain family has 10 children. Considering the probability of birth of 
a boy and a girl equal to V 2 . find the probability that in this family 

(a) there are 5 boys and 5 girls; 

(b) the number of bovs lies between 3 and 8. 

3. In a gathering of 4 persons, the birthdays of three come in one month and 
that of the fourth in one of the remaining eleven months. Considering the prob¬ 
ability of birth of each person in each month equal to 1/12, find the proba¬ 
bility that 

(a) the three persons were born in January and the fourth in October; 

(b) the three were born in some one month and the fourth in one of the 
other eleven months. 

4. In 14,400 tosses of a coin, heads fell 7,428 times. How probable is such a 
large or larger deviation of the number of heads from np if the coin is symmetric 
(that is, the probability of throwing heads in each trial is V 2 )^ 

5. A total of n devices, each with a power consumption of a kilowatts, are 
connected to an electric network. At a given time each is consuming power with 
a probability p. Find the probability that the power consumed at the given time* 

(a) will be less than nap\ 

(b) will exceed map {r > 0) provided that np is great. 

6. An educational institution has a student body of 730. The probability that 
the birthday of a randomly selected student will fall on a definite day of the 
year is 1/365 for each of the 365 days. Find: 

(a) the most probable number of students born on January 1; 

(b) the probability that there will be three students with the same birthday. 

7. It is known that the probability of producing a drill bit of extra-high brit¬ 
tleness (defective) is 0.02. The bits are packed in boxes of a hundred each. 
What is the probability that 

(a) a box will have no defective bits; 

(b) the number of defective bits will be less than 3. 

How many bits have to be put in a box so that there should be at least 100 
good bits with a probability not less than 0.9? 

Hint. Take advantage of the Poisson distribution. 

8. An insurance company has issued policies to 10,000 persons of the same 
age and the same social group. The probability of death during the year for each 
person is 0.006. On January 1 each insured person deposits 12 rubles on his policy 
and if he dies his beneficiaries receive 1,000 rubles from the company. What is 
the probability that: 

(a) the company will suffer a loss; 

(b) the company will make a profit of at least 40,000; 60,(^; 80,000 rubles? 

9. Prove the following theorem: if P and P' are the probabilities of the .most 
probable number of occurrences of an event A m n and n-j-1 independent trials 
(in each of the trials P {A) = p), then P'^P. The equality is excluded if 
(rt+l)P is not an integer. 

10. In the Bernoulli scheme, p = V 2 - Prove that 


(a) 

(b) 


_i_ 

2 y^n 


<^2«(«X 


1 

y 2/1 -f 1 


lim 

n-* CD 


P2n jn d: h) _ 
P in (P) 


if -^ = z (0<z< + oo). 

y n 



106 


Chap. 2. Sequences of Independent Trials 


11. Prove that for npq'^25 


where 


Pn (m) 


1 

Y2npq 


e 


ii 

2 


I . iq — p)iz^ — Sz) ~ 
6 npq 


+ A 


m—np 


A| < 


0.15+0.25 I 

V {npqf 


z \e 


Vnpq 


12. A total of n independent trials have been performed. The probability of 
the occurrence of event A in the tth trial is p,-; P„{m) is the probability of the 
m*fold occurrence of event A m n trials. Prove that 


^ (n) 


(b) Pn (m) first increases and then decreases (if P«(0) or P„(/i) are not them¬ 
selves maximal). 


CD 2* 


13. Prove that for x > 0 


the function 



X 


dz satisfies the inequalities 


X 

T+j^ 




X* 


14. Banach’s match box problem. A certain mathematician always carries two 
boxes of matches with him. Whenever he wants a match, he selects one of the 
boxes at random. Find the probability that when the mathematician draws an 

empty box, the other box will contain r matches (r = 0, 1, 2.n; n is the 

number of matches initially contained in each box). 

15. A total of n machines are connected to an electric transmission line. The 
probability that a machine consuming power at time t will cease to consume up 
to time f + A/ is equal to aA/ + o(A0. If at time 7 a machine is not consuming 
any power, then the probability that it will begin consuming prior to time / + a 7 
is equal to PAf + o(A/), irrespective of the operation of the other machines. 
Form differential equations that are satisfied by the probabilities Pf (/) that at 
time i a total of r machines will be consuming power. 

Note. It is easy to indicate the concrete conditions of this problem: the 
movement of trams, electric welding, power consumption by machine tools with 
automatic cutoff, and so forth. 

16. One workman operates n automatic machines of the same type. If at 
time t a machine is operating, then the probability that it will require attention 
prior to time / + A/ is equal to aA/ + o(A0. If at time t the operator is attend¬ 
ing some machine, then the probability that he will complete his job prior to 
time t-\-At is equal to PAf-f-o(A/). Form differential equations satisfied by the 
probabilities Pr(t) that at time t, n—r machines will be in operation; one is 
being attended and r—\ are in line waiting to be serviced (Po(0 is the proba¬ 
bility that all the machines are in operation). 

Note. It is easy to form differential equations in similar fashion for the more 
complicated problem when N machines are attended by a team of k workmen. 
It is important for practical purposes to compare the economy of one or another 
system of organizing the labour. For this purpose, it is necessary to study the 
steady-state regime, that is, to consider the probabilities Pr{l) as 

It turns out that the work of a team attending kn machines has advantages 
over one operator attending n machines both in the meaning of better utilization 
of the operating time o) the machine and the working time of the operator. 



Markov Chains 


Sec. 17. Markov Chains Defined. Transition Matrix 

A direct generalization of the scheme of independent trials is a 
scheme of what are known as Markov chains, which were studied 
systematically for the first time by the noted Russian mathematician 
A. A. Markov. We will confine ourselves to the fundamentals of his 
theory. 

Imagine that we have a sequence of trials in each of which one and 
only one of k mutually exclusive events ...» (as in 

Chapter *2, the superscript denotes the number of the trial) can occur. 
We say that the sequence of trials forms a Markov chain, or more 
precisely a simple Markov chain, if the conditional probability that 
event (i=\,2, ...,k) will occur in the (s+l)s^ trial (s=l, 2, 
3, ...), after a known event has occurred in the sth trial, depends so¬ 
lely on the event that occurred in the sth trial and is not modified by sup¬ 
plementary information about the events that occurred in earlier trials. 

A different terminology is frequently employed in stating the theory 
of Markov chains, and one speaks of a certain physical system 5, 
which at each instant of time can be in one of the states Ai, Az, ..., Ak 

and alters its state only at times ti, 4, • • •, -For Markov chains, 

the probability of passing to some state Ai{i=\, 2, ..., ^) at time 
T +i) depends only on the state the system was in at time 

t(ts-x<.t<.ts) and does not change if we learn what its states were at 
earlier times. 

By way of illustration we consider two schematic cases. 

Example 1. Imagine that a particle located on a straight line moves 
along the line via random impacts occurring at times 4, 4, 4, . • • • 
The particle can be at points with integral coordinates a, a+l, fl:+2, 

..., b; at points a and b there are reflecting barriers. Each impact 
displaces the particle to the right with probability p and to the left 
with probability q=\ — p so long as the particle is not located at a 
barrier. If the particle is at a barrier, any impact will transfer it one 



108 


Chap. 3. Markov Chains 


unit inside the gap between the barriers. We see that this instance of 
a particle walk is a typical Markov chain. We could just as easily con¬ 
sider the case when the particle is sticking to one of the barriers or 
to both of them. 

Example 2. In Bohr’s model of the hydrogen atom, the electron 
can be in one of the allowed orbits. Denote by At the event that the 
electron lies in the ith orbit. Further assume that changes in the state 
of the atom can occur only at times ti, U, 4, ... (actually, these times 
are random quantities). The probability of transition from the /th 
orbit to the /th at time ig depends only on i and j (the difference 
/—i depends on the amount of energy by which the charge of the atom 
changed at time 4) and does not depend on the orbits the electron 
occupied in the past. 

This case is a Markov chain with an infinite (true, only in principle) 
number of states; this instance would be incomparably closer to a real 
situation if the times of transitions of our system to a new state varied 
continuously. 


:ie # % 


We confine ourselves to the statement of the most elementary facts 
for homogeneous Markov chains in which the conditional probability 
of the occurrence of an event in the (sH-l)st trial, provided that 
in the sth trial the event Af^ occurred, does not depend on the num¬ 
ber of the trial. We call this probability the transition probability 
and denote it by pij\ in this notation, the first* subscript always de¬ 
notes the result of the previous trial, and the second indicates 
the state into which the system passes in the subsequent instant of 
time. 

The total probabilistic picture of possible changes that occur du¬ 
ring a transition from one trial to the immediately following one is 
given by the matrix 



Pii Pii 

Pal Pz-2 


Pxk 

Pzk 


Pk\ Pkz ••• Pkk 


compiled of the transition probabilities; we will call this matrix the 
transition matrix (matrix of transition probabilities). 

The following examples will serve as illustrations. 

Example 3. The system S that we are studying can be in the states 
At, A^, Az\ transition from state to state occurs in accordance with 
the scheme of a homogeneous Markov chain; the transition probabi- 




Sec. 17. Markov Chains Defined. Transition Matrix 


109 


lities are given by the matrix 


/1/2 1/6 1/3 \ 

%= 1/2 0 1/2 
\l/3 1/3 1/3/ 

We see that if the system was in the state Aj, then after a change 
of the state by one step it will remain in the same state with a proba¬ 
bility of 1/2, and it will pass to state with a probability of 1/6, 
and to state Az with a probability of 1/3. But if the system was in 
the state .<42, then after the transition it can (with equal probability) 
find itself only in states A^ and Az\ it cannot pass from state ^42 into 
i42. The last row of the matrix shows us that from the state Az the 
system can pass to any one of the possible states with one and the same 
probability 1/3. 

Example 4. Let us write the transition matrix for the case, des¬ 
cribed in the first example, of a particle in a random walk between 
two reflecting barriers. If we denote by At the event consisting in 
the particle being at a point with coordinate a, by A^y it being at 
a point with coordinate a+1, ..., by As{s~b —a+1), it being at a 
point with coordinated, then the transition matrix will be as follows: 

/O 1 0 0 ... 0\ 

I q 0 p 0 ... 0 \ 

Jti = ( 0 0 /7 ... 0 j 

Vob’odoVo/ 

Example 5. We also write the transition matrix for a particle in 
a random walk between two absorbing barriers. The notations and 
the conditions remain the same as in Example 4, the only difference 
being that the particle which passes to state Ai or A^ remains in those 
states with a probability of 1: 


?t2 = 


10 0 0 
q 0 p 0 
0 q 0 p 


0 

0 

0 


0000 .. 


1 


9 


Let us point out what conditions have to be satisfied by the elements 
of the transition matrix. First of all, being probabilities, they must 
be nonnegative numbers, i.e., for all i and / 

Also, from the fact that in the transition from state Ap prior 
to the (s-Fl)st trial the system must definitely pass to one and 






110 


Chap, 3. Markov Chains 


only one of the states after the (s+l)st trial there follows 

the equation 

k 

'^Pij= 1 (i= 1, 2, k) 

/=i 

Thus, the sum of the elements of each row of the transition matrix 
is equal to unity. 

Our first problem in the theory of Markov chains consists in 
determining the transition probability from state Ap in the sth 
trial to the state after n trials. We denote this probability 

by the symbol Pij{n). 

Let us examine some intermediate trial with the number s + m. 
Some one of the possible events (1 will occur in 

this trial. In accord with the notations just introduced, the proba¬ 
bility of such a transition is equal to Pir(m), And the probability 
of transition from state to state is Prj {n — m). By the 

formula of the total probability, 

k 

Pij{n)=J] Pir{m)-P,j(n—m) (1) 

r=l 

We denote by the transition matrix after n trials: 

/Pn (n) Pu («) • • • (n)\ 

\Pki ('i) (n) ... Pkk (n)/ 

According to (1), the following relation holds between the matrices 
jTj with different subscripts: 

= (0 < m < AZ) 

In particular, for n = 2, we find: 

JTg •— JTj * ~— JXj 

for n = 3 


JTg JTj. — ^2 * 


and, generally, for any n, 


We note a special case of formula (I): for m=l 

k 

2 PirPrj(n—l) 


Example 6. A simple count shows that the two-step transition 
matrices of Examples 4 and 5 of this section are of the following 
form: 




Sec. 18. Classification of Possible States 


111 


for a random walk of a particle between reflecting barriers (s ^ 5) 

q 0 jf7 0 0...000\ 

0 q-{-pq 0 ...000 \ 

q"^ 0 2pq 0 ... 0 0 0 1 

0 * * "o’ * d "o’ "o’ V O p / 

for a particle in a random walk between absorbing barriers 

10 0 . 

q p q Op®. 

g® 0 2pq Op® . 

o" 6 ; ’.i 

It is’intuitively clear that in the case of reflecting barriers a particle 
will, after a large number of steps, be able to reach any point between 
the barriers. But in the case of absorbing barriers, the larger the num¬ 
ber of steps a system has covered, the greater the probability that 
the particle will be absorbed by the barriers. 

Sec. 18. Classification of Possible States 

The classification of states offered here was described at almost 
the same time by A. N. Kolmogorov for Markov chains with a coun¬ 
table set of states and by W. Doeblin for Markov chains with a finite 
set of states. 

The state At is called unessential (or transient) if there exist Aj 
and n such that but Pjt (m)=0 for all m. Thus, an unessen¬ 

tial state possesses the property that it is possible, with positive pro¬ 
bability, to pass from it into other state, but it is no longer possible 
to return from that state to the original (unessential) state. Of the 
examples of the preceding section, we consider the fifth: the random 
walk of a particle between two absorbing barriers. It is easy to see 
that in this example all the states, except Ai and As, are unessential. 
Indeed, no matter what the state (different from Ai and Ag) a. particle 
is in, it can reach both Ai and As with positive probabilities via a 
finite number of steps, but it cannot return from these states into any 
other state. 

All states not unessential are called essential. From the definition 
it follows that if the states A,- and Aj are essential, then there exist 
positive m and n such that along with the inequality P^j-(m)>0 
the inequality Pji(n)>-0 also holds. If and Aj are such that for 
both of them these inequalities hold, given certain m and n, then they 
are called communicating (they are said to communicate). It is clear 
that if Ai communicates with Aj, and Aj communicates with A^, 
then Ai also communicates with A^. Thus, all essential states can be 










112 


Chap. 3. Markov Chains 


partitioned into classes such that all states belonging to a single class 
communicate and those belonging to different classes do not communi¬ 
cate. All states in Examples 3 and 4 of the preceding section are essen¬ 
tial and in each case form a unique class of states. 

Since for the essential state Ai and the unessential state A^-the 
equation Pij(m)=0 holds for any m, we can draw the following con¬ 
clusion: if a system has reached one of the states of a definite class 
of essential states, it can no longer leave that class. Example 5 exhi¬ 
bits two classes of essential states, each of which consists of a single 
element: one class is the state Ai and the other is the state Ag- 

Let us now examine more closely the mechanism of transition from 
state to state inside one class. To do this, take some essential state A i 
and denote by Mt the set of all integers m for which Paint )>• 0. This 
set cannot be empty by virtue of the definition of an essential state. 
It is immediately obvious that if the numbers m and n are contained 
in the set then their sum, m+n, also belongs to this set. Denote 
by di the greatest common divisor of all the numbers of the set Mi. 
It is clear that Mi consists only of numbers which are multiples of 
di*. The number di is called the period of the state Ai. 

Let A i and Ajhe two states belonging to one class. From the fore¬ 
going it follows that there exist m and n such that Pij(m)>0 and 
Pji (n) > 0. The number m+n naturally belongs to Mi and, consequent¬ 
ly, is divisible by di. Let r be an arbitrary and sufficiently large num¬ 
ber. Then rdj belongs to Mj and, hence, Pjj{rdj)>^0. 

But since 

Pa (m + rdj + n)^ Pij (m) P^ {rdj) Pji (n) 

it follows that all numbers of the form m-i-rdj+n, given sufficiently 
large r, belong to the set Mi. Since, by the foregoing, the number 
m-\-n is divisible by da it follows that rdj should be divisible by di, 
and since r is arbitrary, dj should be divisible by di. By similar rea¬ 
soning we can prove that di is divisible by dj. From this it follows 
that di=^dj. 

Thus, all states of one and the same class have one and the same period. 
(We shall denote it by d.) 

The result thus obtained permits us to draw the following conclusion: 
for two states Ai and Aj belonging to one and the same class, the 
inequalities Pi;(m)>>0 and Pji (n)>0 can hold only when m and — n 
are congruent modulo d.** Thus, if we select a definite state A^ of 
the class under study, then to each state Ai of this class we can assign 
a definite number P(0 (P(0=1, 2, ..., d) such that the inequality 
P^i(n) > 0 is possible solely for values of n that satisfy the congruence 


* It is easy to notice that Mi contains all the sufficiently large numbers 
that are multiples of di. 

•* In other words, if the sum m+n is evenly divisible by d. 



Sec. 19. Theorem on Limitir^ Probabilities 


113 


n=p(0 (mod d). We combine into a subclass all the states At to 
which the number p has been assigned. To summarize, then, the class 
of essential states is found to be partitioned into d subclasses S^. These 
subclasses possess the property that for each step the system can pass 
from a state belonging to the subclass 5^ into only one of the states 
of the subclass 5^+1. But if P=d, then the system passes into one of 
the states of the subclass 5i. 

Let A i belong to subclass and A j to subclass From the fore¬ 
going it is clear that the probability Pij(n) may be different from 
zero only when n^y —p (mod d). But if n satisfies this congruence 
and is sufficiently great, then the inequality Pij{n)^0 indeed holds. 

By way of illustration consider Example 4 of ^c. 17. We see that 
all the states of the system form one class. Since it is possible, with 
positive probability, to pass from the state At, given any i, to the 
same state in two steps (and not less than two), it is clear that d=2. 
Thus all the states of the system are subdivided into two subclasses 
Si and Si. Put in subclass Si all states with odd-numbered subscripts 
and in subclass S 2 all states with even-numbered subscripts. It is 
clear that, in one step, it is only possible to pass from each state of 
the subclass Si to a state of the subclass Sa in the same way as from 
the subclass Sa only into the subclass Si. 

Sec. 19. Theorem on Limiting Probabilities 

Theorem. If for some s>0 all elements of the transition matrix Ji« 

are positive^ then there exist constant numbers 2. k) such 

that, irrespective of the subscript i, the equalities 

lim Ptj(n)=Pj 

n-*- 00 


hold. 

Proof. The idea of the proof of this theorem is exceedingly simple: 
it is first established that the greatest of the probabilities Pij(n) 
cannot increase with growth of n and the least cannot decrease. It 
is then shown that the maximum of the difference Pij(n) — Puin) 
(/, /=!, 2, .. .,k) tends to zero when n-^ 00 . This obviously completes 
the proof of the theorem. Indeed, by virtue of the well-known theorem 
on the limit of a monotonic bounded sequence we conclude from the 
first two indicated properties of the probabilities Pain) that there 
exist 



114 


Chap. 3. Markov Chains 


And since by virtue of the third of the indicated properties 

lim max |P/y(n)— Pi/(n)\ = 0 

n -f<x> iKif 

it follows that 

Pj==pj=pj 

Let us now begin to carry out our plan. First of all we notice 
that for n > 1 we have the inequality 

^v(n)=S •)> min Pi,{n—\) S P(( = 

^ /=I ^ !</<* ^ /=1 

=. min 1) (1) 

l<Z<fc 

This inequality holds for every t, in particular for the one at which 

Pij(n)= min P,j(n) 

l<l<k 

Thus, 

min PiAn)'^ min P//(/i—1) 

l<Z<ft l<i<k 

In similar fashion it is easy to notice that 

max Pif(n) ^ max P/An —1) 

l<i<k 

We can assume that zi > s, and. therefore we have the right to 
write down, according to (1), 

P,j(n) = 2 P!r{s)-Prj{n—s) 

rssi 

We consider the difference 

k k 

Pij(n)—P,j{n)= S Pir{s)Pr,(n—s)—'^P„{s)-P,,(n—s) = 

r=l r=i 

= 21 [Pir(s)-Ptr(s)] Pri(n-S) 

Denote the positive differences P,>(s)—P^^(s) by the symbol p.^p, 
and the nonpositive differences by pyV'’L Since 

k k 

r=l r=l 

it follows that 

s [/’fr(s)-p,.(s)]=sp/f-Sp;r=o 

r=l (0 (r) 


( 2 ) 



Sec. 19. Theorem on Limiting Probabilities 


115 


From this equation we conclude that 


(r) ir) 

Since by assumption for all t and r(i, r = l, 2, 3, k) Pi^{s)'>0, 
it follows that 


And so 
Let 


Spjr< I,Puis)=i 

(r) l=l 


0^hii< 1 


h= max hii 

I <: f. / < * 


Since the number of possible outcomes is finite, the quantity h 
(along with the quantities hn) satisfies the inequalities 

0</i<l (3) 


From (1) we find that for any i and I (i, / = 1, 2, k) 


1 P,/ (n)-P>/ («) I = S Pff («-s)-2 ("-S) 

(r) (r) 

<1 max P .(Ai—s)2]P}p— min Prj{n—s)J\^ip 

l <r<k ' PI I <r<k ' PI 






max Pry(n — s) — min P^fn— s) ^ 

l <r<k 


</i max \Pij(n—s)—Pfj(n—s)\ 

1 ^ If / k 


and consequently, also. 


max \Pif{n) —Pj,(ai)|^/i max |P,-/(n— s) —Pf/(rt— s)\ 

i<{. i<k ■' I < A, /<* ^ 

Applying this inequality j^-^j times, we find 
max \Pij{n)—Pij(n)\^ 

1 < A, i < ft 




[^] 


max 

1 < A, / < ft 




n 


[t] 


Since we always have 


Pij{m)—P,^{m)l^l 


max I Pji n) 

1 < A, A < ft ^ 



it follows that 



116 


Chap. 3. Nlarkoo Chains 


When -> oo then oo also; for this reason, by virtue of 

(3) it follows that 


lim max 

rt-t-OO l<f, 


\Pij(n)—P,j{n)\ = 0 


From what has been proved we also conclude that 

ip/=i 

/=! 

Indeed, 


k 


k 


2p/= lim S 

/= 1 n -+ CO /= I 


P^j(n)~ lim 1 — 

n -*■ QC 


I 


Thus, we can regard the quantities pj as probabilities of the occur¬ 
rence of an outcome in the nth trial when n is great. 

The physical meaning of the theorem just proved is clear: the 
probability of a system being in the state .4^ is practically independent 
of the state that it was in in the remote past. 

The above theorem was first proved by the creator of the theory 
of chain dependences A. A. Markov. It was the first rigorously proved 
result of the so-called ergodic theorems that play an important role 
in modern physics. 

It may be proved that if the possible states of a system form a sing¬ 
le essential class, then the ergodic theorem holds. 


Sec. 20. Generalizing the DeMoivre-Laplace Theorem 
to a Sequence of Chain-Dependent Trials 

We shall now focus our attention on a sequence of trials, in each 
of which an event E may or may not occur. We shall assume that the 
trials are not independent, but are connected into a simple Markov 
chain. Thus, if in the kih trial the event E occurred, then the proba¬ 
bility that in the next (^+l)st trial event E will again occur is a; 
now the probability that event _E will occur in the (^+l)st trial, 

given that in the ^th trial event E occurred, is p. Hence, in our case, 
the transition probabilities are given by the matrix 

(a\ — a\ 

vpi -p; 

We shall henceforward assume that both a and p are different from 
0 and 1, for these cases are of no particular interest. The scheme at 
hand is understandably a natural generalization of the scheme of 
independent trials proposed by James Bernoulli and examined by us 
in the preceding chapter. 



Sec. 20. Generalizing the DeMoivre Laplace Theorem 


117 


We must note that assigning the transition matrix does not comp¬ 
letely specify the system of trials, because the first trial has no pre¬ 
cedent and, consequently, the probabilities of occurrence of events 

E and E in the first trial are unknown to us. We therefore denote by 
Pi the probability of occurrence of E in the first trial and by ^i=l—pi 
the probability of event E occurring in the first trial. 

We first solve the two following problems: (1) to find the probability 
that event E will occur in the ^th trial; (2) to find the probability 
that E will occur in the /th trial if event E occurred in the fth trial 

(/</). 

Denote by p^ the probability that event E will occur in the^th trial 
and put < 7 fe=l—p^. It is obvious that in the ^th trial E can occur in 
two mutually exclusive ways: event E will occur in the (k —l)st trial 

and will occur again in the next trial; event E will occur in the 
{k —l)st trial and event E will occur in the next trial. Using the for¬ 
mula for total probability, we find that 

Since p^.i, then setting 6=a—p we find 


In particular, when k=2 
When k=2> 


Pa—piS + P 


P3=Pa6+P=p#+P(l + 6) 


It is easy to verify that for any k >> 1 

p, = p/‘-‘ + p(l + 6+... +6‘-“)=(p.—(1) 

Given the assumptions we have made relative to a and p, the 
quantity 6 satisfies the inequality |6|<1. From the preceding 
formula it follows that as ^->00 

Pk 1_5 


It is interesting to note that the constant to which p^ tend does 
not depend on the probability pi. 

Since the quantity plays the role of a “limiting probability”, 
it is natural to introduce the notation 

Bp I l—a 

1—a + p ’ ^ ^”1—6 

In this notation, 




(!') 



118 


Chap. 3. Markov Chains 


Now denote by pf the probability of event E occurring in the 
/th trial if it occurred in the iih trial. Proceeding as we have 
just done, it will become clear that the probabilities pf satisfy 
the difference equation 

pf = pfl^d + p 

for all />/+!. But /7/2i=a, and therefore, applying the proce¬ 
dure just utilized, we find 

pf =a6/"-i + P(l + 6+...+6/-'-»)=.^+i^6/-' (2) 

or 

pf = p + <l&f-‘ (2') 

Let us now look for the probability of the m-fold occurrence of 
event E among n trials. For this purpose, we break up the desired 
probability, which we continue to denote by P„(/n), into four 
summands: 

P„(m) = P„(m, F£) + P„(m, ££) + P„(m, ££) + P„(m, EE) 

The first term signifies the probability of the m-fold occurrence 
of event E in n trials on condition that in the first trial and last 
trial the event E will occur. The meaning of the other notations 
is now clear without any further explanations. To evaluate Pn(m, EE) 
we first consider the following arrangement of trial results: 
in the first trials, events E occurred 

then in trials, events E occurred 


n »> ^ k »> ^ »> 

As will readily be seen, the probability of such an outcome is 

(1 — a) (1 —P)®»~'p.. .pa'''^”* = 

= +•••■*■''*“* (1 —a)*“ * (1 — p)si+•••+«/(-1-*+ ip*- i 

But since 

k k-\ 

2 ri = m, 2 Si — n —m 

»=i r=i 


this probability is 

Pia®"* (1 —a)*"i (1 _p)«-«*-ft+ip^-i 

Note that it is dependent solely on m, n and k and is independent 
of the values of rj and Sy. Since the number m may be decomposed 




Sec. 20. Generalizing the DeMoivre-Laplace Theorem 


119 


into k positive summands in ways, and n—m may be rep¬ 
resented in the form of a k —1 positive summand in ways, 

the probability of the m-fold occurrence of event £, in which 

events E will occur in the form of k groups and events lE in the 
form of k —1 groups, is 

(1—a)*'* (1 

Since k can take on any value from 2 to m, 

m 

P„(m, EE) = p, 2C‘-\a'”-*(l-a)*-‘C‘:i_i(l-P)"-'”-*«p‘-‘ 

k=2 


In similar fashion we find: 

m 

P„(m. ££) = p. 2 C‘-\a”-*(l-a)*C‘zJ._,(l-p)"-“-*P*-* 

k= 1 
m 

EE) = q, 2 Cfc?.a'»-*(l-a)»-‘CSri._,(l-P)'-“-*p* 

k= 1 
m 

P„(m, EE) = q, 2 Cfc?,a“-*+i(l-a)*‘‘C‘riJ,_i(l-p)“-“-*p* 

k = 2 

To evaluate all these four probabilities, consider the expression 

m 

A„n = 2 Cia”--* (1 (1 - P)»-”>-*p* 

k=^ 1 

and introduce the notations 

, T/"ma(l —a) + (n —m) p (1 —B) 

m=np + z y —i(3) 
and 

k — m{\ — a)-\-uYma{\ —a) 
k = {n —m) p + uK(n—m) § (1 —P) 

We will carry out the computations assuming that 

« = o(m'/®), v = o{ri^) 

where 7 is some number that satisfies the inequalities 0 < 7 <l/ 6 . 
We decompose the quantity A„„ into three summands: 



120 


Chap. 3. Markov Chains 


putting 


m (1 -a)-Ui Vma (1 -a) m (1 - a) +u, Vma (1 - a) 

2.= 2, . 2,= _ 

m (IV^ma (1-a) 

m 

2 .= 2 _ 

m (1 - a) +Uj Vma (1 - a) 


We begin the computations with the middle sum. 

Repeating word-for-word the arguments given in the proof of the 
local theorem of DeMoivre-Laplace, we find 


C*,a'”-*(1—a)* 


1 




n-m — k 


Y 23 xa (1 —a) m 

1 _ 

y"2n {/I — m) p (1 —-P) 




p» 

2 


e (1 + co„) 


The quantities and approach zero uniformly within t{ie 
chosen bounds. 

Thus, 



__J_ 

2jt y min —m)aP(l—a) (I — p) 



e 


«*+ V* 

“ (i + a);)(i + o);) 


According to the DeMoivre-Laplace integral theorem, 

V ==- . !. t e ^ du {I(i>„) 

-Ut 

Since tends to oo together with n, we can write 

oe u* + v* 

"V =- - ^ ■ ■-:= f e ^ dw (1 -j- CO ) 

-^2 2JI |A(n —m)P(l —P) J 

— oo 

But u and v are connected by the equation 

m(l—a)-f w Kma(l— a) = {n —P (1 — P) 

Substitute here the value of m from (3). After obvious simplifica¬ 
tions we find 


zV moL (1—a)-f (n— m) p (1 — p) + MKma(l—a) = 


= /«)P(1—P) 



121 


Sec. 20. Generalizing the DeMoivre-Laplace Theorem 


Whence 


V — 


^ 7 :_=_[zKm (1-a) a + (n-m) P (1 _P) + 


Thus, 

1 wa(l —oc) + (n—m)p(l-P) 



-{-uV ma{l — a)] 


and, consequently, 



_ ma(l —g) _ 

ma(i—a) + (n —m)P(l—p) 



_1__ 

2ji [ma{\ — a)-^(n~-m) P (1 —P)] 


' (i + ®J 


3 


We now note that according to (3) 

ma (1 — a) + (n—m) p (1 — p) = npa (1 — a) + tiq^ (1 — P) + 

+ 0(zKw) = W(l+a —P)(l—a + P)+0(2K/2) 

Thus, asymptotically, 

ma(l—a) + (n—m)p(l —p) = np^(l +a—P)(l~a 4 - P) 
and 

1 -il _ 

^2“ P)(l-a+p) ^ + 

To estimate the sum 2^, we introduce the notation 
w; = (1 -a)' (l-P)«-'»-'P^- 

and note that the relation 


«/ (l-P)a 

«/+.! {m—i){n—m — i) (I —a)p 

increases with increasing i and remains less than unity for i that 
are not too large. Let j^m{\—a)—u^yma{\--^, 

Vj = Uji tly.j == My — « 

and for the remaining values of 

Vi = VjyJ~ * 


2, <'’! + '’« + 


+ W/ < t)y 


1 

1—H 


It is clear that 



122 


Chap. 3. Markov Chains 


Since, in accordance with calculations carried out earlier, 


u\+ v\ 


tv = , ' (l + (o;)(l + co;:) 

^ 2ji V m (rt —m) ap (1 — a) (1 —p) 

and for sufficiently large n 

it follows that 

S,=o(i) 

In similar fashion it becomes evident that 

2.=o(i) 

As a result we find that 2a the main part of 
Comparing with the sought-for probabilities, we find: 


P„(m, EE)=: 


Y 2K[ma (1 —«) + («—m) P (1 — P)] 

Pi(l—a) 




2 * 

P„(m, EE)=-y= , . e'~ (1 +a„) 

]/ 23X [ma(l —a)H- (n —m) p (1 —P)] 


P„(m, ££)= 


<7i(l—a) 


” ’ V^2ji[ma(l—a)H-(n—m)P(l — p)] 

From this we conclude that 


e~ 2 (l + a)„) 


/ 

The local theorem is proved. 

We note that if the transition probabilities satisfy the equality 

<x = P 

then the local theorem assumes the same form as in the case of 
independent trials. 

Proceeding in the usual way we can also derive the integral 
limit theorem from the local theorem, no matter what Zj and 
are, 


P hi < 


m—np 


I / 1 -j- 0^ — P 





Exercises 


123 


The quantity co„ tends to zero uniformly in Zj and Zg as n increa¬ 
ses to infinity. 


EXERCISES 


1. The transition probabilities are given by the matrix 


( 1 


Jii= < 




2 

l_ 

2 

2 


1 i-l 

3 6 

1 -L 

3 6 

1 JL 

3 6 


What is the number of the states? Find the two-step transition probabilities 
from state to state. 

2. An electron may reside in one of a countable set of orbits depending on 
its energy. Transition from the iih orbit to the /th orbit takes place in one 

second with a probability Find: (a) the transition probabilities for 

two seconds; (b) the constants c,-. 

3. The transition probabilities are given by the matrix 





0 1 I 
2 2 


0 

1 


2 


2 






Is Markov’s ergodic theorem applicable in this case? If it is, then find the 
limiting probabilities. 



CHAPTER 


Random Variables 
and Distribution Functions 


Sec. 21. Basic Properties of Distribution Functions 

One of the basic concepts of probability theory is that of the random 
variable. Before giving a formal definition we shall illustrate it with 
a number of examples. 

The number of cosmic particles impinging on some area of the 
earth’s surface during a definite time interval is subject to appre¬ 
ciable variations depending on a multitude of random factors. 

The number of calls arriving from subscribers at a telephone ex¬ 
change during a definite time interval is also a random variable and 
takes on values of one kind or another depending on accidental cir¬ 
cumstances. 

The deviation of the point of impact of a shell from the centre of a 
target is determined by a large number of diversified causes of an 
accidental nature. The result is that in the theory of gunfire one has 
to consider the dispersion of shells about the centre of the target as a 
random phenomenon and regard the indicated deviations as random 
variables. 

The velocity of a gas molecule does not remain invariable but chan¬ 
ges depending on collisions with other molecules, of which there are 
great numbers even within a very brief span of time. Knowing the 
molecule velocity at a given instant, one cannot state with full de¬ 
finiteness what its value will be, say 0.01 or 0.001 second hence. 
Change of molecular velocity is of a random nature. 

These examples show very definitely that random variables are 
involved in the most diversified fields of science and technology. 
The natural and extremely important problem arises of creating 
methods for studying random variables. 

Despite the diversity of concrete content in the foregoing examples, 
they essentially present the same picture from the viewpoint of the 
mathematician. Namely, in each instance we have to do with a quantity 
that in one way or another describes the phenomenon under study. 



Sec. 21. Basic Properties of Distribution Functions 


125 


Under the effect of random circumstances, each of these quantities 
is capable of taking on a variety of values. One cannot state before¬ 
hand what value the quantity will assume, for it varies in random 
fashion from trial to trial. 

Therefore, in order to know a random variable it is first and fore¬ 
most necessary to know the values that it can assume. However, 
a simple list of values of the random variable is not enough for us 
to draw essential conclusions. Indeed, if in the third example we con¬ 
sider a gas at different temperatures, the possible values of molecular 
velocities will remain the same, whereas the states of the gas will 
differ. Thus, to specify a random variable it is necessary to know not 
only what values it can assume, but also how often, that is, with what 
probability, it assumes these values. 

The diversity of random variables is extremely great. The number 
of assumed values may be finite, countable and uncountable; the 
values may be distributed discretely or fill the intervals continuously, 
or not fill the intervals, but be spread out, everywhere dense. In order 
to specify the probabilities of the values of random variables that are 
so diversified, and to be able to specify them in one and the same 
fashion, we introduce into the theory of probability the concept of 
the distribution function of a random variable. 

Let I be a random variable and x be an arbitrary real number. 
The probability that | will take on a value less than x is called the 
distribution function of probabilities of the random variable 

F(x)=P{|<x) 

Let us agree from now on, as a rule, to denote random variables 
by Greek letters and the values that they assume by lower-case Latin 
letters. 

Let us summarize what has been said, remaining at the level of 
a qualitative description: a random variable is a variable quantity 
whose values depend on chance and for which a distribution function 
of probabilities has been defined. 

We consider examples of distribution functions. 

Example 1. Denote by p the number of occurrences of event A 
in a sequence of n independent trials, in each of which the probability 
of its occurrence is a constant equal to p. Depending on chance, p 
can assume all integral values from 0 to n inclusive. According to the 
results of Chapter 2, 

{m) = P {p = m} = 


* A formal definition of a random variable will be given on p. 127, 



126 Chap. 4. Random Variables and Distribution Functions 

The distribution function of the variable fx is defined in the following 
manner: 

( 0 for a:^0 

f(x)= forO<A:<« 

( 1 for xy n 

The distribution function is a step-like line with jumps at the points 
a:= 0, 1, 2, ..., Ai; the jump at the point x=k is 
The foregoing example shows that the so-called Bernoulli scheme 
may be included in the general theory of random variables. 

Example 2. Let the random variable | take on the values 0, 1, 2, ... 
with the probabilities 

p„ = P{l = n)=^(n = 0,\,2, ...) 

where X> 0 is a constant. The distribution function of % is in the form 
of a sort of staircase with an infinite number of steps, with jumps at 
all nonnegative integral points. The magnitude of the jump at the 
point x=n is equal to p„; for x ^ 0 we have F (a:)= 0. Of the random va¬ 
riable examined in this case, it is said that it is distributed in accor¬ 
dance with the Poisson law. 

Example 3. We say that a random variable is normally distributed 
if its distribution function has the form 

* ( 2 - 0 )* 

(D(a:) = C J ^ 2a* dz 

— os 

where C'> 0, a>> 0, and a are constants. Later, we will establish 
the relationship between the constants a and C and will elucidate 
the probabilistic meaning of the parameters a and o. Normally distri¬ 
buted random variables play a particularly important role in proba¬ 
bility theory and its applications; we will have good reason to be 
convinced of this later on. 

Note that if in the first two examples considered above the random 
variable could take on only a finite or countable set of values (dis¬ 
crete Yanables)^ random variables distributed in accord with the normal 
law can assume values from any interval. Indeed, as we shall see 
below, the probability of a normally distributed random variable 
taking on a value lying in the interval Xi^ l<.x^ is equal to 

■1* (z-fl)* 

a)(X2)—0 (a:i) = C Ja' 2a* dz 

Xi 

and consequently is positive for any and Xtixi^Xt). 



Sec. 21. Basic Properties of Distribution Functions 


127 


Now, after these preliminary remarks of an intuitive nature, we 
pass to a rigorous formal exposition of the notion of a random variable. 

In defining a random variable we shall proceed in accordance 
with the general concept of a random event of a set of elementary 
events (/, a set of random events and a probability measure P{-4} 
defined on it. In other words, our point of departure is a probability 
space {U, Fy P}. With each elementary event e we associate a certain 
number 


\=f(e) 


We say that % is a random variable if the function f (e) is measurable 
relative to the probability introduced into the set U under consideration. 
To put it otherwise, we demand that for each value of a:(— cxxC x<. 4- 
+oo) the set Ax of those e for which f(e)<. x should belong to the set 
F of random events and, hence, that for it there should be defined the 
probability 

P{|<x}=P{^,}=/='(4 

which we have called the distribution function of the random variable 

Example 4. We consider a sequence n of independent trials in 
each of which the probability of occurrence of the event A is constant 
and equal to p. In this example, the elementary events consist of 
sequences of occurrences and nonoccurrences of the event ^ in n trials. 
Thus, one of the elementary events will be the occurrence of event A 
in each of the trials. It is easy to compute that there will be a total 
of 2" elementary events. 

We define the function \i=f{e) of an elementary event e as follows: 
it is equal to the number of occurrences of the event A in the elementary 
event e. According to the results of Chapter 2, 

P{^ = ^} = P„(^) = CJ/7V-" 

The measurability of the function ii=f(e) in the probability field 
is immediately obvious, whence, by definition, we conclude that p 
is a random variable. 


Example 5. Three observations are taken of the position of a 
molecule moving in a straight line. The set of elementary events 
consists of the points of three-dimensional Euclidean space Ra. The 
set of random events F consists of all possible Borel sets in the space Ra. 

For each random event A the probability P (A) is defined by the 
equation 


p(^)= 


Jl 


1 

2a* 


[ (Jf 1 - fl) * + ( j:, - a) * + Us - a)»j 


dxt dx<, dx^ 



128 


Chap. 4. Random Variables and Distribution Functions 


Now consider the function l — f(e) of the elementary event 
e — (xi, x^, ATg) defined by the equation 

i ~ (-^1 “h ^2 ~i~ ^ 3 ) 

This function is measurable relative to the probability we introdu¬ 
ced, and so I is a random variable. Its distribution function is 

1 3^ 

fW = P{i<A:}=^^^3 JJJ = 

Xi + X, + Xa < 3x 


e 2a« ^2 


From the point of view just developed, operations on random 
variables reduce to familiar operations on functions. Thus, if |i 
and I2 are random variables, that is, measurable functions relative 
to the probability introduced, 

li=fi {eh h=h{e) 

then any Borel function of these variables is also a random variable. 
To illustrate, 

^ = ll + |2 

is measurable relative to the introduced probability and for this 
reason is a random variable. 

In Sec. 24 we will develop this remark and will derive a number 
of results that are important in theory and applications. In parti¬ 
cular, a formula will be derived for the distribution function of a 
sum based on the distribution of the summands. 

With the aid of the distribution function of the random variable 
I it is possible to define the probability of the inequality Xi^l<.X3 
for any Xi and x^. Indeed, if by A we denote an event that consists 
in I taking on a value less than x^t by B an event consisting in the 
fact that ICXi and, finally, by C the event that Xi^l<ix^f then it 
is obvious that the following equation holds: 

A^B-hC 

Since events B and C are mutually exclusive, it follows that 

P(^)=P(J 5 ) + P(C) 



But 


P(^) = F(Xg). ?(B)=^F{x,h P(C) = PK<S<a:J 



Sec. 21. Basic Properties of Distribution Functions 


129 


therefore 

W—(!) 

Since, by definition, probability is a nonnegative number, it 
follows from (1) that for any Xi and x^(xC>x^ we have the inequa¬ 
lity 

F(Xi)^F{Xi) 

that is, the distribution function of a random variable is a nondec¬ 
reasing function. 

It is further obvious that the distribution function F(x) for any x 
satisfies the inequality 

0</^(a;X1 (2) 

We say that at a:=ji£:o the distribution function F (x) has a 
jump if 

(^ 0 +0) — 7^ (xo “ 0)=Co> 0 

A distribution function cannot have more than a countable set of 
jumps. -Indeed, a distribution function cannot have more than one 
jump of magnitude greater than V^, more than three of magnitude 
from one fourth to one half (V 4 CC’o^V 2 ). Generally, there can be 

no more than 2"—1 jumps of magnitude from ^ to . It is quite 

clear that we can number all the jumps, arranging them in magnitude 
beginning with large values and repeating equal values as many 
times as the function F (x) has jumps of that magnitude. 

We shall now establish some other general properties of distri¬ 
bution functions. We define F( —oo) and F(4-oo) by the equations 

F{ —oo)= lim F(—n), F(-|-oo) = liinF(-|-«) 

/I-* + 00 /!->» 

and will prove that 

F{ —oo)=0, F(-|-oo)=l 

Indeed, since the inequality | < -f oo is certain, it follows that 

P{i<+oo}=:l 

Denote by Qf^ the event that k —Since the event 
g oo is equivalent to thfe sum of events it follows, on the 
basis of the extended addition axiom, that 

p(i<+oo}= I: p{Qj 

k= — 00 

Consequently, as n—^oo, 

S P{QJ= i [f(A.)_f1 

fe=I—rt k=l-n 



130 


Chap. 4. Random Variables and Distribution Functions 


From this, taking into account the inequality (2), we conclude that 
as n —► oo 




The distribution function is continuous on the left. 

Choose some increasing sequence Xq< x^<^x^<C.. .<i .. 

converging to ;c. 

Denote by the event {a:„ < x). It is then clear that AfCiAj, 

for i > /, and the product of all the events A„ is an impossible 
event. On the basis of the continuity axiom, we should have 

lim P (A„) = Urn {F (x}—F (x„)}=F{x)— lim F {x„) = 

n-*<x> n-*<x> n-*<x 

= F{x)—F {x—0) = 0 

which is what we set out to prove. 

In exactly the same way it can be proved that 

P{|<x}=F(x + 0) 


We see, therefore, that every distribution function is a nondecreas¬ 
ing function that is continuous on the left and satisfies the conditions 
F{ —c»)=0 and F(-}-oo)=l. The converse is also true: every function 
that satisfies the enumerated conditions may be regarded as the distri¬ 
bution function of some random variable. 

We note that whereas every random variable uniquely defines its 
distribution function, there are an arbitrary number of different 
random variables having one and the same distribution function. 
Thus, if I takes on two values —1 and +1, each with a probabi¬ 
lity V 2 and T]=—I, then it is clear that ^ is always different from q. 
Nevertheless, both these random variables have one and the same 
distribution function 


F(x)=^ 


0 
1 


for —1 
for — 1 < a: ^ 1 
for a: > 1 


Sec. 22. Continuous and Discrete Distributions 

The behaviour of a random variable is sometimes described not 
by specifying its distribution function but in some other way. Any 
such description is called a distribution law of the random variable 
if by following specific rules it is possible to obtain the distribution 
function from it. For instance, the interval function P {xu x^}, which 
is the probability of the inequality a;i^^<a: 2 , is such a distribution 
law. Indeed, knowing P {xi, X 2 } we can find the distribution function 
from the formula 


F{x)=P {— 00, x} 



Sec. 22, Continuous and Discrete Distributions 


131 


We already know that it is also possible from F {x) to find the function 
P{x^y x^ for any x^ and x^\ 

P(xu x^}=F {x^)—F {x^) 

As a distribution law, it is often useful to take the set function 
P {E) defined for all Borel sets and representing the probability that 
the random variable | will take on a value belonging to the set E. 
By virtue of the extended addition axiom, the probability P{E) 
is a completely additive set function, that is, for any set £*, which 
is the union of a finite or countable number of disjoint sets 

P{E}=^P{En) 

Of all possible random variables we isolate first of all those which 
can assume only a finite or countable set of values. We call these 
variables discrete. For a complete probabilistic description of a dis¬ 
crete random variable which with positive probabilities takes on the 
values Xu Xiy Xa, ..., it is sufficient to know the probabilities pk— 
= P{|=Xft}*. It is obvious that by using the probabilities pk it is 
possible to define the distribution function F {x) by means of the 
equation 

in which the summation is extended over all indices for which Xk<,x, 

The distribution function of any discrete variable is discontinuous 
and increases in jumps for those values of x which are possible values 
of The magnitude of the jumps of the function F {x) at the point x 
is, as we found out earlier, equal to the difference F(x+0 )—F {x). 

If two possible values of the variable | are separated by an interval 
in which there are no other possible values of then the distribution 
function F (x) is constant in this interval. If the possible values of | 
is a finite number, say n, then the distribution function F (x) is a 
step-like curve with an n-j-l interval of constancy. But if there is a 
countable set of possible values of i, then this set may also be every¬ 
where dense so that there may not be any intervals of constancy in the 
distribution function of the discrete random variable. By way of 
illustration, let the possible values of I be all of the rational numbers 
and only them. Let all these numbers be numbered in some fashion: 
Tj, Ta, ..., and let the probabilities P{E=rft}=pfe be defined by means 

of the equation ph=^ • In our example, all rational points are points 

of discontinuity of the distribution function. 

As another important class of random variables we isolate those 
for which there is a nonnegative function p (x) that satisfies the fol- 


* These, and only these, values will be called possible values of the 
discrete random variable 



132 


Chap. 4. Random Variables and Distribution Functions 


lowing equation for any x: 

F(x)= {p(2)dz 

— <x 

Random variables that possess this property are called continuous', 
the function p(x) is called the density of probability distribution or 
the probability density function. 

If the function F {x) is absolutely continuous, and all the more so 
if it is differentiable for ail values of the argument, then its derivative 
is the density function: p(x)=F'(x). 

We note that the density function has the following properties: 

(1) p(x)>0; 

(2) for any Xi and x^ it satisfies the equation 

Xi 

P {-^1 < ^ J P (^) dx 

X^ 

in particular, if p(x) is continuous at the point x, then to within 
higher-order infinitesimals P {x < xdx} — p(x)dx; 

(3) ^ p(x)dx= I. 

Quantities distributed in accord with the normal or the uniform 
law* are instances of continuous random variables. 

Example. Let us examine the normal distribution law more clo¬ 
sely. For it the density function is 

(x-ay 

p{x) = C-e 


The constant C is determined on the basis of Property 3. Indeed, 

C\e dx=\ 

Changing the variables —^ — z reduces this equation to the form 

Cg ^e ^ dz=\ 

The integral on the left-hand side of this equation is known as 
the Poisson integral] here, 

J e ^ dz = \^2 tl 


* This is a law with a distribution function varying lineary from 0 to 1 in 
some interval (a, b) and equal to zero left of the point a and equal to one to 
the right of b. 



Sec. 22. Continuous and Discrete Distributions 


133 


We thus find that 


C = 


1 

0 V^2jt 


and, consequently, for the normal distribution 


p{x) 



jx-a)* 

30 * 


The function p(x) reaches a maximum at and has points of 
inflexion at x=azto; the axis of abscissas serves it as an asymptote 
as x->±oo. To illustrate the effect of the parameter a on the shape 



of the graph of the normal density function, we give (in Fig. 14) the 

graphs of p(x) fora==0 and ( 1 ) 0 ^=-^; (2) 02 = 1 ; (3) 02=4. We see 

that the smaller the value of 0, the greater the maximum value of 
the curve p{x) and the steeper the drop. For one thing, this means 
that the probability of falling in the interval (—a, a) is greater for 
the normally distributed random variable (with parameter a—0) for 
which the quantity 0 is smaller. Hence, we can consider 0 a characte¬ 
ristic of the dispersion of the values of the variable |. Fora=?^=0, the 
density curves have the same shape but are shifted to the right (a>0) 
or to the left (adO) depending on the sign of the parameter a. 

Of course, there are random variables other than discrete and con¬ 
tinuous. Besides those that behave in one set of intervals like conti¬ 
nuous variables and in others like discrete variables, there are vari¬ 
ables which are neither discrete nor continuous in any interval. In 
this group of random variables are all those functions whose distri¬ 
butions are continuous but which only increase on a set of Lebes^e 
measure zero. An example of such a random variable is the quantity 
having the well-known Cantor curve as its distribution function. Let 
us recall the construction of this curve. The variable | only takes 
on values between zero and unitv Thus its distribution function 



134 


Chap. 4. Random Variables and Distribution Functions 


satisfies the equations 

F(a:)= 0 for F(x)=\ for x^l 

Within the interval (0, 1), I assumes the values only in the first and 
the last third, in each with a probability V 2 . Thus, 

F {x) = Y ioT x^^ 


In the intervals 


0, 4-) and fl-. 1 


I again can assume 


values 


only in the first and the last third of each of them, and in each 
with a probability Vd- This defines the values of F (a;) in two more 
intervals: 


F(x) = \ for 
f W = f for 


In each of the remaining intervals the same construction is repeated; 
this process continues ad infinitum. As a result, the function F {x) 
proves to be defined on a countable set of intervals which are cointer¬ 
vals of a certain nowhere-dense perfect set of measure zero. On this 
set we redefine the function F {x) relative to continuity. The variable | 
with a thus defined distribution function is not discrete, for its distri¬ 
bution function is continuous, but at the same time I is not continuous, 
for its distribution function is not the integral of its derivative. 

All the definitions that we have introduced are readily carried over 
to the case of conditional probabilities. Thus, for instance, if the event 
B is such that P{5}>>0, then we shall call the function F (x/B) = 
= P{^<;jc/B} conditional distribution function of the random va¬ 

riable I, given the condition B. It is obvious that F{xlB) possesses 
all the properties of an ordinary distribution function. 


Sec. 23. Multidimensional Distribution Functions 

In what follows we will need, in addition to the notion of a 
random variable, also the concept of a random vector or, as it is 
often called, a multidimensional random variable. 

Let us consider a probability space {i/, F, P} on which are defined 
n random variables: 

ln=fn{e) 

The vector (|i, I 2 , ..., In) is called an n-dimensional random variable. 

Let (li, I 2 , ..., In) be a random vector. Denote by {|i<a:i, | 2 <a: 2 , 
..., ln<A:n} the set of elementary events e for which all the following 



Sec. 23. Multidimensional Distribution Functions 


135 


inequalities hold at the same time: /i {e)<.Xu f 2 {<^)<.X 2 , • • •, fn i^Xxn- 
Inasmuch as this event is the product of events {/i {/ 2 (^)<a: 2 }» 

• • •» {/n i^)<.Xn}j it belongs to the set F, that is, 

■{^1 Xi , ^2 X2t • • • » En Xn } ^ 

Thus, for any set of numbers Xu Xz, ..., there is defined a proba¬ 
bility F(Xu Xz, Xn) = P{li<Xu ^ 2 <X 2 , En<A:n}. ThlS 

function of n arguments is called the n dimensional distribution func¬ 
tion of the random vector (|i, ^ 2 » • • •» In)- 
Later on we will resort to a geometrical illustration and will regard 
the variables |i, I 2 , • - -, In as the coordinates of points in n-dimen- 
sional Euclidean space. It is obvious that the position of a point 
(li. I 2 , • • •» In) depends on chance and that the function F ..., Xn) 
in such an interpretation yields the probability that the point 
(li» • • •» In) will fall in the n-dimensional parallelepiped hCxu 
12 <.X 2 , In<''t^n with edges parallel to the coordinate axes. 

Using the distribution function it is easy to compute the probability 
that the point (|i, I 2 , - - -, In) will be inside the parallelepiped 

ai^h<bi (/=!, 2, n) 

where fl,- and bf are arbitrary constants. It is easy to compute that 
PK<ll<^. «2<l2<^2. '•••. = 

n 

— ^ iP\t ^2» • • •» ^n) 2 Pi 2 Pi] ^ • 

1=1 i<i 

... +(— a^, ..., fl„) (1) 

where P/y.,.* denotes the value of the function F (c^, Cg, ..., c„) 
for Ci = ai, Cj — aj, ...» Cf^ = a]^ and for the other Cg equal to b^. 
We leave the proof of this formula to the reader. We note, for 
one thing, that F .... + 00 , x„) gives us the 

probability that the following system of inequalities will be ful¬ 
filled: 

ll X-iy I 2 X^y • • |;i( — Xff+iy . • In Xf^ 

Since by the extended addition axiom of probabilities 

P {ll -^If • • • » Ia-i*^ ^A + 1 -^A + U • • • ♦ In Xjj} = 

00 

“ 2 P •••* ^A-l'^'^A-l» ® ^ ^A ^ “i” ^ » 

S= — 00 

Ia+i"^ -^A+l* •••» ^ ^n\ ^ •••» -^A-l* •^A + l» •••» ^n) 

it follows that P(x,, ...» x^^^y 00 , Xf^+i, ..., x„) is the distribu- 
tion function of the {n — l)-dimensional random variable ..., 
^A-i, Ia+ii • • i ln)‘ further continuing this process we can 



136 


Chap, 4. Random Variables and Distribution Functions 


determine the ^-dimensional distribution functions of any group 
of k variables ..., < /^ < ... < using the formula: 

“ P • • • » ^ik ^ ^ (^l» ^2» • • • » ^n) 

where c^ = Xs if 8 = 1^ (l^r^k) and c^=+oo in other cases. In 
particular, the distribution function of the random variable is 

^k (^) ~ ^ ^ 2 » • • • > ^n) 

where all Ci(i^k) are equal to +oo, and Cu—x. 

Just as the behaviour of a one-dimensional random variable may 
be described not only by means of the distribution function but also 
in other ways, so multidimensional random variables may be defined, 
say, by means of a nonnegative completely additive set function 
0 {£} defined for arbitrary Borel sets of Ai-dimensional space. We 
define this function as the probability of the point (^i, ..., falling 
in the set E. This method of a probabilistic description of an n-di- 
mensional random variable should be regarded as the most natural 
one and, theoretically speaking, the most appropriate. 

Let us consider some examples. 


Example 1. A random vector (^i, ..., is said to be uniformly 
distributed in the parallelepiped (1 ai), if the pro¬ 

bability of the point (^i, ^ 2 , • • •» In) falling in any interior domain 
of this parallelepiped is proportional to its volume and the probability 
of its falling into the parallelepiped is a sure event. 

The distribution function of the sought-for quantity has the form 


F(Xi, 


f 





0 if Xi^Ui for at least one i 




where Ci = Xi if 
and Ci = if X/ > b^ 






Example 2. A two-dimensional random variable (1^, is dist¬ 
ributed normally if its distribution function is 

F(x, y) = C J 

— 00 — 00 

Here, Q (x, y) is a positive definite quadratic form of x—a and 
y — b, where a and b are constants. 

It is known that a positive definite quadratic form of x —a and 
y—b may be written in the form 


r g-Q («, V) diidu 


Q{x, y} = 


(a: — a)2 
2A^ 


ix~a)(y—b) (y—b)^ 
AB 2S2 



Sec. 23. Multidimensional Distribution Functions 


137 


where A and B are positive numbers and r, a and 6 are real num¬ 
bers, r being subject to the condition — 

It will readily be seen that for r^^l each of the random vari¬ 
ables and I 2 is subject to a one-dimensional normal law. Indeed, 


Fi(x^) = P{%^<x^} = F(Xi, +oo) = C J le-'^^’‘-«'>dxdy 






2/1* 


~ 00 


(I-/-*) 


_ir y-b r(.x-a) ~l2 


— CO 


Since 


it follows that 




__ 

. 2 [ " 


^ ^ J dtj=FiV9.n 


Pi = BC 1^271 ^ e 


dy=B\P2n 

Xi ix-ay 


2 / 4 * 


(l-r*) 


dx 


( 2 ) 


— OD 


The constant C may be expressed in terms of A, B and r. This 
dependence may be found from the condition /^i(+oo)= 1. We have 


1 = BC K2ji 



Whence 


(x-ay 

2 ^* 


(i-r*) 


dx = 


ABC V 2jt f 


If r^=^\, then we put 


P Vl-r‘ 
2nAB 


£l 

dz = 


2ABCn 


A = o^y\ —r^ B = G^y\—r^ 


In these new notations the two-dimensional normal law takes on 
the following form: 


F{x„ x^) = 


2naiG2 


I 


2 (1 -/•*) 


(x-ay 


^_ {x-a) (y-b) . (y-b)^ 

— if -- 1 --— 


010* 


0O 


dxdy 


— 00 — 00 


The probabilistic meaning of the parameters that enter into this 
formula will be elucidated in the next chapter. 

When r‘^=\ Equation (2) becomes meaningless. In this case, h 
and I 2 are connected by a linear relation. 

We can establish a number of properties for multidimensional dis¬ 
tribution functions just as we did in the one-dimensional case. We 
will simply state them and leave their proofs to the rdader. A distri¬ 
bution function 



138 


Chap. 4. Random Variables and Distribution Functions 


(1) is a nondecreasing function of each of its arguments, 

(2) is continuous on the left in each of its arguments, 

(3) satisfies the relations 


+00, +c»)=l 

lim F (jCj, ..., = 0 


for arbitrary values of the remaining arguments. 

In the one-dimensional case we saw that the enumerated properties 
are necessary and sufficient for the function F (x) to be a distribution 
function of some random variable. In the multidimensional case it 
appears that these properties do not suffice. For the function F {xu 
..., Xn) to be a distribution function we have to add the following 
(in addition to the enumerated three properties): 

(4) for any at and (i=l, 2, ..., «) expression (1) is not negative. 
That this requirement may not be fulfilled, despite the fact that 
the function F{xi, ..., Xn) has Properties 1 to 3, is seen from the 
following example. Let 

F(x u)= { ^ x-fz/<l or i/<0 

^ \ 1 in the remaining part of the plane 

This function satisfies the conditions (1) to (3) but for it 

f(l, l,_f l) + f(|. |)=-1 (3) 

and, consequently, the fourth condition is not satisfied. 

The function F {x, y) cannot be a distribution function because 
the difference (3) is, according to the relation (1), equal to the 
probability of point (1^, falling in the rectangle 
1 . 

If there exists a function pix^y x^y ..., x„) such that for any 
X,, Xg, ..., x„ the equation 


*1 ** Xn 

F (Xj, Xj, ...» x„) = 

holds, then this function is called the probability density function 
of the random vector (|i, l^y ..., |„). It is readily seen that the 
density function has the following properties: 

1. p (Xj, Xg, • •., x,j) 0, 

2. The probability of a point (li, |g, ..., |„) falling in some 
domain G is 


f ... j Pi^it 


'a* 


z„)dz„... dz^dZi 



G 


p{x 


1» 


., x„) dx„... dXj 



Sec. 23. Multidimensional Distribution Functions 


139 


In particular, if the function is continuous at 

the point {x^, xj, then the probability of the point (|i, 

U falling in the parallelepiped < x^ + dx/^ ik=\, 2, 

n) is, to within higher-order infinitesimals, 

p (Xj, jCg, ..., x^) dx^, dx^ .. . dx^ 

Example 3. As an example of an n-dimensional random variable 
having density, we give a variable uniformly distributed in an 
/ 2 -dimensional domain G. If by V we denote the / 2 -dimensional 
volume of the domain G, the distribution density will be 


p(Xi, 



0 if (Xi, Xg, ..., xj 6 G 
-^if (Xi, Xg, ..., xj GG 


Example 4. The density of the two-dimensional normal law is 
given by the formula 

_ i r (x-a)^ _ {x-a) (y-b) { y-b)^ 

1 2(1—r®) fj2 O 1 O 2 (j2 1 

p{x, y)= - ^==-e •- 

2k0i02V^ 1 — r^ 

We note that the normal density function retains a constant 
value on the ellipses 


(x—a)^ o^ (x—a)iy—b) 

af ^ ^1^2 g\ 


(4) 


where X is a constant; for this reason, the ellipses (4) are called 
ellipses of equal probabilities. 

Let us find the probability that the point (|i, widl fall with¬ 
in the ellipse (4). By the definition of a density function 


P(X)= y)dxdy (5) 

GCK) 

where G (X) denotes the domain bounded by the ellipse (4). To 
compute this integral, we introduce the polar coordinates 

X—a = pcos0 
y—b = p sin0 

The integral (5) then takes the form 


P(X) = 


1 

2miQ2 V 1 


2n%fsV \-r» 



e 


0 0 


p* 

2 


s» 

pdp dQ 



140 


Chap. 4. Random Variables and Distribution Functions 


where for brevity 

„ 1 f cos^ 0 cos 0 sin 0 , sin^ 0‘ 

® —i—2r——-!—j- 

i ^ L a£ ai 

Integration with respect to p yields 

2ji 

1 _ 2fl-r^) r 

P(X)= ‘ ^ j ^ 

2jiaia2 1 — r^ 0 s^ 

Integration with respect to 0 may be performed by the rules for 
integrating trigonometric functions, but this is not necessary since 
it is automatically carried out by means of probabilistic reasoning. 
Indeed, 

2n 

P( + oo)=l= - ' C ^ 

2naia2y 1 —r^ ^ ® 

Therefore 

2jt 

0 

and, thus, 

P (A,) = 1 —e” “" 

The normal distribution plays an exceedingly great role in a variety 
of applied problems. The distribution of many random variables of pra¬ 
ctical importance turns out to be subject to the normal distribution 
law. For instance, the vast experience of artillery firings carried out 
under a diversity of conditions has shown that shell dispersion on a 
plane during gunfire from a single gun at a specific target obeys the 
normal law. In Chapter 8 we will see that the “universality” of the 
normal law is explained by the fact that any random variable that 
is the sum of a very large number of independent random variables, 
each of which exerts only a slight effect on the sum, is distributed 
almost according to the normal law. 

The crucial concept of probability theory—the independence of 
events—retains its significance for random variables as well. In 
accord with the definition of the independence of events we can say 
that the random variables |i, ia, ..., are independent if for any 
group if,, if*(tj*<t 2 <..of these variables we have 
the equation 

P ^ %t% ^ • • • » %i]t } ~ 

=p{if,<p{if,<.jc,,} ... p{i/i<jffj 



Sec. 23. Multidimensional Distribution Functions 


i41 


for arbitrary Xi^, Xi^ and for any In parti¬ 
cular, the following equation holds for arbitrary x^, x^, x^'. 

P {^1 ^ ^2 ^ ■^2» • • • » En “ 

= ... P{l„<x,} 

or in terms of distribution functions, 

F (Xi, x^, • • • > Xfj) = Fj^ (x) F^ (-^ 2 ) • • • F 

where Ff^^x^) denotes the distribution function of the variable 
It is readily seen that the converse proposition is also true: if 
the distribution function F (x-s^, jCg, of a system of random 

variables ^ 1 , I 2 * •••»!« has the form 

F {Xif ^ 2 , ■^n) ~ (-^l) ^2 (■^ 2 ) ••• ^ni^n) 

where the functions F^ixi^) satisfy the relations 

F*(-f cx))=l (/f=l, 2, n) 

then the variables Ig, l„ are independent and the func¬ 
tions Fi(Xi), F^(x^), ..., F„(x„) are their distribution functions. 
We leave it to the reader to verify this proposition. 

If the independent random variables fg, have density 

functions pi(x), p^{x)f p„(x), then the n-dimensional variable 
(ii» l 2 » •••» in) has a density function equal to 

P {X^y X^y •••» <^n) ” Pi (-^l) p2 (-^ 2 ) ••• Pra (-^n) 

Example 5. We consider an n-dimensional random variable, the 
components of which li, ig* • • •» in ^re mutually independent ran¬ 
dom variables distributed in accordance with the normal laws: 

Xk _ (2-gfc)* 

_ 05 

In the example at hand, the distribution function is 

_JLn _ (■g-gfe)' 

F(Xiy Xg, ,, .y x„) — {2tc) ^ 11 dz 

k = l _ Qp 

The n-dimenslonat density function of the variable (|i, is 

- -y _ JL Ufc- ttk)* 

P{Xu X .“ ‘=> ( 6 ) 

For /i = 2 this formula becomes 



142 


Chap. 4. Random Variables and Distribution Functions 


A comparison of this function with the density of the two-dimen¬ 
sional normal law (Example 4) shows that for the independent 
random variables and the parameter r is equal to 0. 

For n = 3 formula (6) may be interpreted as the density function 
of the components |i, ^3 of molecular velocity along the coor¬ 

dinate axes (the Maxwell distribution) provided it is assumed 
that 


where m is the mass of a molecule and is a constant. 

Example 6. There are two independent random variables | and r\ 
with distribution functions equal to F (x) and G(x), respectively. 
Find the probability that r] will take on a value less than 1 . 

We consider the plane {I, ti). For q to be less than I, it is 
necessary that the point {i, q) fall in the half-plane q < i. The 
probability of the simultaneous realization of the inequalities 

x^l<Cx-]-dx, q<x 

is equal to G{x)dF{x). Since x may have any value from —00 to 
+ 00, by virtue of the formula of total probability (generalized in 
obvious fashion) the desired probability is 

00 

a = ^ G {x)dF (x) 

— CO 

In particular, if G{x) = F{x), this probability is 

CO 

a = J F{x) dF\x) 

— QD 

If the function F {x) is continuous, then 

a=0.5 

There is no such simple result for discrete random variables. This 
will readily be seen in the case of the random variables I and q ta¬ 
king on only two values, 0 and 1, with the probabilities p and q=\—p, 
respectively. In this example it is obvious that 

a=pq 


Sec. 24 . Functions of Random Variables 

The information we have obtained about distribution functions 
enables us to begin the solution of the following problem: from the 
distribution function F(xi, Xg, ..., x^) of a collection of r.andom va¬ 
riables lu Iz, •••, In determine the distribution function O(^1, ^2, 



Sec. 24. Functions of Random Variables 


143 


..., yk) of the variables (li, .... in), , In). 

f\k=fkilu in)- 

The general solution of this problem is extremely simple but requ¬ 
ires an extension of the integral concept. So as not to be drawn aside 
into purely analytical problems, we confine ourselves to a consideration 
of the most important special cases: discrete and continuous random 
variables. In the next section we will give the definition and the prin¬ 
cipal properties of the Stieltjes integral; there we will give the general 
form of the most important results of the present section. 

Let us first consider the case when the n-dimensional vector (|i,..., 
In) has a probability density function p{Xu Xa, x^). From the 
foregoing it is seen that the desired distribution function is defined 
by the relation 

^ (l/l, • • • » yk) ~ ^ \ P (-^l, • • • , ^n) tiXi dXg . . . dXfj 

the region of integration D being determined by the inequalities 

/i (Xi> Xa, • • • » ^n)y i 1, • • • , ^) 

In the case of discrete random variables the solution is obviously 
given by means of an n-fold sum, which is also extended over the do¬ 
main D. 

We now apply to certain important special cases the general re¬ 
mark that we just made relative to the solution of the general prob¬ 
lem that we posed. 

The Distribution Function of a Sum. Let it be required to find 
the distribution function of the sum 

= ^2+ . • • +ln 

if p (xi, Xa, ...» Xn) is the probability density function of the vector 
ilu ^ 2 , • • •, ^n)- The desired function is equal to the probability of 
the point (|i, |a, ..., In) falling in the half-space li+la + ... + 
+ln<Cx and consequently 

(D (x) = \ ... \ p (Xi, Xj, ..., x„) dxi dx^ ... dx„ 

We consider in more detail the case of n = 2. The preceding 
formula now takes the form 


X-Xg 

^(^)=S xJdXidx2=J J p{Xi, x^)dx^dx^ (1) 

*i+X*<Jt -00 

If the variables li and Ig are independent, then p(Xi, xj = 
= Pi (Xi )/?2 W Equation (1) may be written in the following 



144 


Chap, 4. Random Variables and Distribution Functions 


form: 

X-Xi X 

0(x)=^dx, y pt(Xi)p,(x,)dx^=^dXj J p,(Xj)p,(z—x:,)dz = 

— 00 — 00 

X 

= J dznp^(x^)p^(z—x^dx\ (2) 

— 00 ' ^ 

In the general case, formula (1) yields 

X 

<D(a:)=: J dx^^piz, x^—z)dz (3) 

— 00 

The last equations prove that if a multidimensional distribution 
of summands has a probability density function, then their sum 
also has a density function. This density, in the case of indepen¬ 
dent summands, may be written as 

P(x)=^Pi(x—z)p,(z)dz ( 4 ) 

Let us consider some examples. 

Example 1. Let and Ig be independent and uniformly distri¬ 
buted in the interval (a, b). Find the density function of the sum 
“q “ ^1 "h ^2' 

The probability density functions of and are equal to 

r 0 ii x^a or xyb 
= I if a<x^b 

Using formula (4) we find that 

b b 

Pr, (X) = ] Pi (z) Pa (a:— z) dz = J p, {x—z)dz 

a a 

From the fact that for x < 2a 

X —2 < 2a —2 < a 

and for x > 26 

x~z > 26 —2 > 6 

we conclude that for x < 2a and x > 26 

P,ix) = 0 



Sec. 24. Functions of Random Variables 


145 


Now let 2a<x< 2b. The integrand is different from zero for only 
those values of z which satisfy the inequalities 

a < x—z < b 


or, which is the same thing, the inequalities 

x^b <z <x-‘a 


Since x>2a, it follows that x —a>a. Obviously, x — a^b for 
x^a + b. Hence, if 2a< x^a-\-b, then it follows that 


X-i 


dz 

ib—a)^ 


X — 2a 



In exactly the same way, when a+b <,x^2b 


Pv 


b 


(X)= J 
x-~b 


dz 2b—X 
(6—a)a ~ {b—af 


G)llecting together the results obtained, we find 


{ 0 for and jc >26 

§E^.ioT2a<x<a+b (5) 

for a+6<-*<26 

The function py^{x) is called the Simpson distribution law. 

In the example considered, the computations are greatly simpli¬ 
fied if we take advantage of geometrical reasoning. As usual, depict 
and as rectangular coordinates in the plane. Then the proba¬ 
bility of the inequality + for 2a<x^a-\-b is equal to 
the probability of falling in the doubly cross-hatched right triangle 
(Fig. 15). This probability is readily found to be 


{X) = 


{x—2aY 
2 {a—bf 


For a-fb<x<2&, the probability of the inequality + 
is equal to the probability of falling in the entire shaded figure. 
This probability is 


F, {X) = 1 


{2b—x)^ 
2(6—a)* 


Differentiation with respect to x leads us to the formula (5). 

In connection with the example we have considered it is inte¬ 
resting to note the following. 



146 


Chap. 4. Random Variables and Distribution Function 


General problems of geometry led N. I. Lobachevsky to the 
necessity of solving the following problem: given a group of n 
mutually independent random variables I 2 , (errors of 

observation), find the probability distribution of their arithmetic 
mean. 



He solved this problem only for the case when all errors are 
uniformly distributed in the interval (—1, 1), and it was found 
that the probability of an arithmetic-mean error lying within the 
limits from —a: to a; is 


where the summation extends over all integral r from r = 0 to 
fn—/uc'l 

'■-[“T-J- 

Example 2. A two-dimensional random variable (h, is distri¬ 
buted in accordance with the normal law: 


p{x. y)== 


1 1 / (X—a)2 


xexp. 




ol 


2r (y—iff 


OiG^ 


\ 2(1 

Find the distribution function of the sum ti = ^i+| 2 . 
According to the formula (3) 

1 


a| 


)} 


P7,{x) = 


2nuiai l/" 1—r* 


X fexp 


(z — a) (x—z—b) ■ (x—z—b)^ 

rv . rr I 2 




dz 



Sec. 24. Functions of Random Variables 


147 


For the sake of brevity, denote x — a—b by v and z—a by w, 
then 




Since 


1 


X 


2jiaia2 1 — 


W“ cy^ u{v—u) , jv — u)^ _..2 qi + 2 raiaa+g 2 n,.,, Oi + ro 2 , _ 

Cf rr ‘ 2 ^ 2 2 ^UU 2 -f- ^ 

^1^2 al 0x02 OiOl Oa 


ai 


u 


V a? + 2Aaiaa+a| o t^i+^'CTa 


Oia 


1^2 


+ ^ 1 


02 


V a? + 2raia8+0|J 

(Oi + z'Oa)^ 


+ 


Oi -f- 2/'CTj02 H” Oa 


) 


f,. V 0i+2r0i02 + 0| V Oi + z-Oa 

fc«v 


0,0 


1^2 


of-{~2rOi02-|-0a 


+ 


(1 —r®) 

of-|-2r0i0a ~ 1 “ Oa 


it follows that by introducing the notation 


t = 


1 


v~ 


u 


of ~I~2/'0 i 02~|~O2 P _ Ol~l~^02 _j 

|/■0f + 2r0,0a+o|J 


v‘ 


we will reduce the expression for (x) to the form 

exp J 

P-n (^) = — 

Since 


2 (of+ 2r0iOa + 0|) 
2jx of-|-2roi02 + 0a 




u = x— a—b and * rf/ = K2jt 

it follows that 

(^) = 


(x-a-b)* 


1 


2 ( a|+ 2 ro,a,+a“) 


V 2 n (of + 2/'0iO2 + 0|) 


In particular, if the random variables lx and l^ are independent, 
then r = 0, and the formula for Pj^{x) takes on the form 


^ l/2ji(0f + 0l) 


ix—a—b)* 



148 


Chap. 4. Random Varieties and Distribution Functions 


We have thus obtained the following result: the sum of the components 
of a normally distributed random vector is distributed according to the 
normal law. 

It is interesting to note that when the summands are independent, 
the converse proposition (Cramer's theorem) holds: if the sum of two 
independent random variables is distributed in accordance with the normal 
loWy then each summand is also normally distributed. We shall not go 
into the proof of this proposition since it requires a more intricate 
mathematical apparatus. 

Example 3. The Distribution. Let |i, la ..., In be independent 
random variables distributed according to one and the same normal 
law with parameters a and a. 

The distribution function of the variable 




is called the distribution (or chi-square distribution). 

This distribution plays an important role in various problems 
of statistics. 

We shall now compute the distribution function of the variable 


1 = -^. It will be independent of a and a. 
y n 

It is obvious that for negative values of the argument, the 
distribution function <y>{y) of the variable | is zero; for positive 
values of y the function 0(y) is equal to the probability of the 
point (li, Irt) falling within the sphere 


n 


X, (j:*— a)> = (/»-n-a> 
*= 1 


Thus, 


n 


-2 — 

{y) “ J* * * • J* ^ y^2yi ) ^ ^ ~ 


0 ) 


To compute this integral, we pass to spherical coordinates, that 
is, we perform the substitution 


jfi = p cos 01 cos 0g ... cos 0„_i 
cos 01 cos 0, ... sin 0„^i 


Xn = p sin 01 




Sec. 24. Functions of Random Variables 


149 


As a result of this substitution, 


3T 

2 y VH 


^ (y) J ••• j J ^ • • 


n, 


n 0 
2 


l/Vn _£! 

... dQi = C„ J ^ * p”“^rfp 


where-the constant 


n 

2 


2 


— (l/'Jxt)” J *** J 

^ ^ ' rt Jt 


3t 

2 


is dependent solely on n. 

This constant is readily calculable by using the equation 


CO p* 


® (4-00) = 1 = C„ J e" • p"-‘dp = C„r (f) • 2 

0 

From this we find 


—1 


y 


VF 


‘; { p"'‘ 
2» ‘r(-5-) J 


£1 

e 2 dp 


The density function of the random variable g for y'^0 is 

, , VSi (y vvy* 

i>(2')=7pyVw^/ « ‘ (6) 

Whence, in particular, for n=l, we naturally get a density func¬ 
tion equal to twice the density of the initial normal law: 

<f(y) = ■|/^«~(y>0) 

For rt=3 we get the familiar Maxwell law 



150 


Chap, 4. Random Variables and Distribution Functions 


It is easy to derive the density function of the variable 
either by computations similar to those carried out or directly from 
formula (6). When a:^ 0, this density function is 


Pn {X) 



X 

2 


2* r 


n 

2 



The following is a table of variables associated with and 
frequently used in practical problems: 


Variable 

Density function for 

n 

*=] 

-1-1 _-l 
\ 2 

2^r (1) 

n 

n no^ 

k = \ 

n 

( n\^ 

(t) f-.-f 

'■[i) 

fe=l 

2 t ~ T 

-x" ~^e ^ 

]/'V~ n^ S 

1 

(xYT y-' -"jl 
r(|)UT; ^ 


Example 4. The Distribution Function of a Quotient. Let the 
probability density function of a variable (|, ri) be p(x, y). It is 

required to find the distribution function of the quotient C = ^ • 

By definition, 


<•<} 



Sec. 24. Functions of Random Variables 


151 


If I and T] are regarded as rectangular Cartesian coordinates of 
a point in a plane, then F^(x) is equal to the probability that 
the point g, t]) wi 11 fall in a region whose point coordinates satisfy 

the inequality This region is shaded in Fig. 16. 



Fig. 16 


According to the general formula, the sought-for probability is 

eo zac 0 OD 

\p^y’ z)dydz (7) 

0 -00 -00 ZX 

Whence it follows that if | and q are independent, and Pi (a:) and 
p^{x) are their density functions, then 

00 0 

F^{x)=\F^{xz)p^(z)dz-\- J ll—Fi(xz)]p^{z)dz {!') 

0 — 00 

Differentiating (7) we find 

00 0 00 

p^{x)=^ zp{zx, z)dz— ^ zp{zx, z)dz= \z\p(zx, z)dz (8) 
0 —00 —00 

In particular, if | and q are independent, then 

CD 0 

p^(x) = lzpi{zx)p,{z)dz— ^ zpt{zx)p^(z)dz i 8 ') 



152 


Chap. 4. Random Variables and Distribution Functions 


Example 5. The random variable (|, t]) is distributed according 
to the normal law: 


2^.(7,2(l-r^) of 


xy , 

a\ 


Find the distribution function of the quotient C 
From formula (8) 


_ I 




"OB 0 " 

X j— f zexp 

-0 —OB - 


’2(1—/-S) 


a\x^ — 2ro^02,x + cji 


2 2 
aiQz 


<iz = 


noi0.2.Y^ 1 — r^ 




Perform a substitution in the integral by putting 

a%x^ — 2ro^(j2X +a? 


u — 


2 ( 1 — 


2 2 
Qi Qa 


The expression for px^{x) will then become 


Pr (^) = 


a^a^y 1 — 

Ji (aijc^ — 2ra^G^x -f- al) 


f e'~‘^du = — 

J ji (alx^ — 


OyO,}^ l-r^ 

(g\x^ — 2ra,aaX+ o\) 


In particular, if the variables I and t] are independent, then 


Pc (a:) = 


^l<^2 
.2 1 _2. 


Jl {Gi-\-GiX^) 


The density function of the variable ^ is called Cauchy's law. 

Example 6. Student^s Distribution. Find the distribution function 
of the quotient C = where | and r] are independent variables, 
I being distributed according to the normal law: 




andq —(see Example 3), so that 


, u)- 

^ V"2 J 



Sec. 24. Functions of Random Variables 


153 


According to the formula (8') 


00 


P; W = j Z ]/ 


/ " 


m^x 


2y» 


2n 


r (— 


ztnry-^ - 

rr) * 


«2^ 

2 


dz~ 


CO 


rn 


l^jrr 


Substituting 


we find that 


yT 


«“1 « 


nz^ 


(x^+1) 


nzdz 


u = ^-{x^+\) 


Pt M 


n + \ 
2 


00 


1“ 


n — \ 


e~‘^du 


n-\-\ 


n + 1 


jir 

The probability density function 

;^±i) 




(a:=+1) 


Pc W = 




n+1 

2 


is called Student*s law*. 

For n=l, Student’s law becomes Cauchy’s law. 

Example 7. Rotation of the Coordinate Axes. Given the distribution 
function of a two-dimensional random 
variable (^, tj)* ^^nd the distribution 
function of the variables 

cosa+T) sin a, r)'^= 

=—I sin a-f T} cos a (9) 



Denote by F (x, y) and ^ (x, y) the 
distribution functions of the variables 
(I, n) and nO- we regard (|, n) 
and (!', Tj') as the rectangular Cartesian 
coordinates of a point in the plane, 
then it will be easy to see that the 
coordinate system 0%'r[ is obtained from the system Oln by rotation 
of the axes through the angle a. We confine ourselves to the case 

0<a<y, leaving it to the reader to derive analogous formulas for 

the remaining values of a. 


* Student was the pseudonym of the English statistician W, L. Gosset, 
who first discovered this law in empirical fashion. 




154 


Chap. 4. Random Variables and Distribution Functions 


Let the coordinates of the point M in the O^r) system be x and 
and in the 01't]' system, x' and y'. Then the function F(x, y) is equal 
to the probability of the point (|, t]) falling in the right angle bounded 
by the half-lines AB and BM (Fig. 17), and the function y') 
is equal to the probability of the point (|, tj) falling in the angle 
bounded by the half-lines CM and DM. The equations of the straight 
lines CM and DM in the coordinate system is of the form: for 
CM, 


and for DM 


‘n=(|—tan a+y 
Ti=—(I—jv) cot a-\-y 


Since (x, y) and (x', y') are connected by the equations 
x'=x cos a-\-y sin a, y'=—x sin (x-\-y cos a 
these equations may be written in a different form: 


r\ — l tan a 


cos a 


Ti= —|cota + 


x' 

sin a 


By virtue of what has already been said. 

The integral is extended over the interior part of the angle CMD. 
It is easy to see that 


0{x', 



5 P(l, + 


+s 


I p(l. 




Differentiating this equation with respect to x' and y\ we find 


nix', y')== 


aO) ix', y') 
dx'dy' 


= p{x. 



= p (x' cos a—sin a, x' sin a-\-y' cos a) (10) 


Example 8. The two-dimensional random variable (|, r\) is dis¬ 
tributed according to the normal law: 


Pix, (/) == 


1 

23X0102}^ 1— r'^ 



2(1 



xy 

0x02 




Sec. 25. The Stieltjes Integral 


155 


Find the density function of the random variables 

I'=icosa + 'nsina, t|'= — ^sina + T]cosa 


By (10) we have 

71 {x\ y') — p{x' cosa —^'sina, jc'sin acos a) = 


where 




cos^a 


2r 


cos a sin a , sin** a 


Gi 


ol 


n COS a sin a 
ts = -::-r 


OiOa 

sin^ a—cos^ a cos a sin a 


ol 


OjOj 


c= 


sin^ g 

Ol 


+ 2r 


cos a sin a , cos'* a 


0x02 


+ 


02 


From the formula obtained we conclude that rotation of the coor¬ 
dinate axes transforms a normal distribution into a normal distri¬ 
bution. 

It will be noted that if the angle a is chosen so that 


then B = 0 and 

n{x\ 


tan 2a 


2rai02 


of 


o; 


Ax'* 


</') = 


2jiai02 \ — 


Q a (l-r*) 


Cy’* 

2 (l-r*) 


This equation implies that any normally distributed two-dimensional 
random variable may, by rotation of the coordinate axes, be reduced 
to a system of two normally distributed independent random variables. 
This result can be extended to ^-dimensional random variables. 

It is possible to prove a stronger proposition that exhaustively des¬ 
cribes a normal probability distribution. Let there be a nondegenerate 
(i.e., not concentrated on a single straight line) probability distribu¬ 
tion in the plane. For this distribution to be normal, it is necessary 
and sufficient that it be possible, in two different ways, to choose 
in the plane the coordinate axes Ohlz and 0 T)iir )2 such that the coor¬ 
dinates h and ^2 (jnst as % and qa), regarded as random variables 
with a given probability distribution, should be independent. 


Sec. 25. The Stieltjes Integral 

The Stieltjes integral will be made substantial use of in what fol¬ 
lows, and so to facilitate the study of subsequent sections we give 
here a definition and the basic properties of the Stieltjes integral 
without dwelling on proofs. 



156 


Chap. 4. Random Variables and Distribution Functions 


Suppose in an interval (a, b) we have defined the function f{x) 
and a nondecreasing function F(x) of bounded variation. For the 
sake of definiteness we will assume here that the function F{x) is 
continuous on the left. If a and b are finite, we partition the interval 
{a, b) into a finite number of subintervals (x^, Xi+) by means of the 
points a—XQ<iXt<Cx 2 ...<:.Xn=b and form the sum 

2 f(Xi) [f (JC/)—f (a:,-,)] 

where Xi is an arbitrary number chosen in the interval x^^. 

We now increase the number of partition points and at the same 
time let the length of the largest subinterval approach zero. If in 
this process the foregoing sum tends to a definite limit 

y = Iim — /•(X,.,)] (I) 

«-fQO f = l 

then this limit is called the Stieltjes integral of the function f(x) 
with respect to the integrating function F (x) and is denoted by 
the symbol 

b 

J = lf(x)dF (x) ( 2 ) 

a 

When the interval of integration is infinite, the improper Stieltjes 
integral is defined in the usual way: the integral is considered 
over an arbitrary finite interval (a, b)\ the quantities a and b are 
made to approach —oo and -j-oo in arbitrary fashion; if there 
exists a limit 

b 

lim \ f (x) dF (x) 

a-^-oD “ 

6 -* 00 

then this limit is called the Stieltjes integral of the function f{x) 
with respect to the function F (x) in the interval (— oo, oo) and 
is denoted by 

\f\x)dP(x) 

It may be proved that if the function /(x) is continuous and boun¬ 
ded, then the limit of the sum (1) exists in the case of both finite 
and infinite limits of integration. 

In certain cases the Stieltjes integral exists for unbounded func¬ 
tions f(x) as well. Such integrals are of considerable interest to pro¬ 
bability theory (expectation, variance, moments, and others). 



Sec. 25. The Siieltjes Integral 


157 


Everywhere henceforth we will consider that the integral of a fun¬ 
ction f (x) exists when and .only when there exists an integral of \f{x)\ 
with respect to the same integrating function F(x). 

For the purposes of probability theory it is important to extend 
the definition of the Stieltjes integral to the case when the function 
f(x) may have a finite or countable set of discontinuity points. It 
may be proved * that any bounded function having a finite or coun¬ 
table set of points of discontinuity, in particular, any function of 
bounded variation, is integrable with respect to any integrating fun¬ 
ction of bounded variation. It is then necessary to modify somewhat 
the definition of the Stieltjes integral; namely, when forming the limit 
(1) it is necessary to consider only those sequences of subdivisions 
of the interval of integration such that each point of discontinuity 
of f(x) is one of the partition points of all subdivisions with the ex¬ 
ception, perhaps, of a finite number of them. 

It should be noted that when establishing the limits of integration 
it is important to indicate whether one or another end point of the 
interval of integration is included or not. Indeed, from the definition 
of the Stieltjes integral we obtain the following equation (the symbol 
a — 0 implies that a is included in the interval of integration and the 
symbol a-f 0 that a is excluded): 

^ n 

5 f{x)dF{x)=lim 2/(JC/) [/^ (■«;)—= 

fl_0 rt->eo/ = l 

= lim 2 f(xi) [F (Xi) —F (j:,..,)] + lira ffx) [F (x,)—F (x:„)] = 

12 00 / ss2 

b 

= J f \x)dFix)-\-f^)[F(a-\-0)—F (a)] 

a+O 

Thus, if f{a)^0 and the function F {x) has a jump at x — a, then 

b b 

I f(x)dF{x)—l f{x)dF{x)=^fia)[F{a + 0) — F{a—0)] 

a-O a+0 

This means that the Stieltjes integral, extended over an interval 
that reduces to a single point, can yield a result different from 
zero. From now on we shall agree that unless otherwise stated 
the right end point of the interval will be excluded and the left 
end point will be included in the interval of integration. This 
condition permits us to write the following equation: 

b 

J dF (x) — F{b) — F (a) 


* See V. I. Glivenko, The Stieltjes Integral, p. 116 (in Russian). 



158 


Chap. 4. Random Variables and Distribution Functions 


Indeed, by definition, 

* n 

f dF (x) = lim 2 [P (x,)—F (a;,.,)] = lim [f {x„)—P (a:„)] = 

g n -* CO i =il n-*- CO 

= F(b)—F{a) 

(recall that by definition F (x) is continuous on the left and, 
hence, for it F(b) = lim F (h —e)). 

e -*• 0 

In particular, if F (x) is the distribution function of a random 
variable then 

b 

J dF(x)=F(b)-F (a)=P I < fr} 

a 

b 

J dF{x) = P(b)=^9{l<b) 

— CO 

If F(xl has a derivative of which it is an integral, then from 
the fact that by the formula of finite increments 


f (Xi)—F (jc,-.,) (x,) {x,—x,.i) 


where Xf_i <Xi< Xi there follows the equation 
] f (a;) dF (x) = lim 2 A (^/) [P (^/) —^ (-^/-i)] == 

Q n-»- OB / = 1 

n ^ 

= lim 2 / (Xt) P (Xi) {Xi — JC,.,) = \ / (jc) p (x) dx 

n-t- CO I I g 

We see that in this case the Stieltjes integral reduces to an ordi¬ 
nary integral. 

If F (x) has a jump at the point x — c, then by selecting the 
subdivisions so that for certain values of the subscript X/^CicKXf^+i 
we have 

b k 

J f (x) dF (jc) = lira 2 / (x,) [f (x,)—F (jc,.,)] + 

g n-* CO i —I 

+/(c)[f (JCJ+,)—f(jC 4 )] + lim 2 f(xt)[F(Xi)—P{x,.^)] = 

n-*-co i=k+2 

c b 

= lf(x)dF(x)+ J f{x)dF(x)i-Hc)[F(c + 0)—F(c—0)] 

a c+0 



Sec. 25. The Stieltjes Integral 


159 


In particular, if the value of the function F {x) changes only at 
the points Ci, Cg, then 

^ 00 

[f{x)dF(x)= 2 /(O [f'(c„ + 0)—0)] 

; »=i 


and the Stieltjes integral reduces to a series. 

We will now enumerate the principal properties of the Stieltjes 
integral that we will need in what follows. The reader will find 
no difficulty in providing the proofs of these properties by proceed¬ 
ing from the definition of the Stieltjes integral and taking ad¬ 
vantage of the reasoning used in the theory of the ordinary integral. 

1. For a < c, < Cj < ... < < 6 


a 


n 1 

U{x)dF{x)=^ [ f{x)dF{x) [a = Co, b = c„+^] 
/=o 


2. A constant factor may be removed from under the integral sign: 


b 


b 


J cf (a:) dF (x) =c ^ f (x) dF (x) 


3. The integral of a sum of functions is equal to the sum of 
their integrals: 

j il//WdfW = i) ^fi{x)dF{x) 

a i=l a 


4. If fix)'^0 and b>a, then 

b 

J / (x) dF {x)'^0 

a 

5. If Fi(x) and F^{x) are monotone functions of bounded varia¬ 
tion and c, and are arbitrary constants, then 

b b b 

I f(x)d [c/ 1 (x) + («)] = Cj J / (X) dFi (x) + c, J / (x) dF, (x) 

a a a 

X 

6. If F{x) = ^ g{u)dG (u), where c is a constant, g{u) is a con- 

c 

tinuous function and G(u) is a nondecreasing function of bounded 
variation, then 

b b 

j f W dF {x) = lf (x) g (X) dG (x) 



160 


Chap. 4. Random Variables and Distribution Functions 


Employing the concept of the Stieltjes integral, we can write 
general formulas for the distribution function of the sum of two 
independent random variables, and 

F (x) = 5 Fj (x—z) dF^(z) = J Fj {x—z) dF^ (z) 
and also for their quotient 

Sa 

<x> 0 

f(*)= Jf,(^z)d/',(z)+ J [l—Fi{xz)]dF^(z) 

0 — OB 

on the assumption that P{|g = 0}=0. 


EXERCISES 

1. Prove that if F{x) is a distribution function, then for any /i 0 the 
functions 


x+h 

J F(x)dx. 

X 


X+fl 

X'-h 


are also distribution functions. 

2. A random variable | has F {x) as its distribution function {p{x) is the 
density function). Find the distribution function (density function) of the ran¬ 
dom variable: 

(a) ii=aE-f-^, a and b are real numbers; 

(b) t] = E-i(P{E = 0}=:0); 

(c) ii = tanE; 

(d) iri=^cos|; 

(e) T] = /(|), where f {x) is a continuous monotone function without intervals 
of constancy, 

3. From the point (0, a) draw a straight line at an angle © to the y-axis. 
Find the distribution function for the abscissa of the point oi intersection of 
this line with the x-axis if 


(a) the angle (p is uniformly distributed in the interval ^0, ; 

(b) the angle qp is uniformly distributed in the interval (—f’T)* 

4. A point is thrown at random on the circumference of a circle of radius 
with centre at the coordinate origin (in other words, the polar angle of the point 
of impact is uniformly distributed in the interval (— n, ji)]. Find the density 
function for: 

(a) the abscissa of the point of impact; 

(b) the length of the chord connecting the point of impact with the point 
(~ R, 0). 

5. A point is thrown at random on the segment of the ordinate axis between 
points (0, 0) and (0, R) (that is, the ordinate of the point is uniformly distri¬ 
buted in the interval (0, /?)]. Through the point of impact draw a chord of the 
circle x^-i-y^ = R^ perpendicular to the |/-axis. Find the distribution of length 
of the chord. 



Exercises 


161 


6. The diameter of a circle is measured approximately. Considering that it is 
uniformly distributed in the interval (a, b), find the distribution of the area of 
the circle. 

7. The density function of a random variable | is given by the equation 


P (x) 


a 

e-x^gx 


Find: 

(a) the constant a; 

(b) the probability that in two independent observations | will take on values 
less than 1 . 

8 . The distribution function of a random vector (|, q) is of the form: 

(a) F (x, y) = {x) (y) + f g (x); 

(b) Fix, y) = Fi(x)F2iy) + Fsix)-\-Fiiy). 

Can the functions Fg (x) and F 4 (x) be arbitrary? Are the components of the 
vector (I, q) dependent or independent? 

9. Two points are dropped at random on the interval (0, a) [that is, their 
abscissas are uniformly distributed on the interval (0, a)J. Find the distribution 
function of the distance between them. 

10. A total of n points are dropped on the interval (0, a). Assuming that the 
points have been dispersed at random [that is, each of them is situated irre¬ 
spective of the others and is distributed uniformly on ( 0 , a)}, find: 

(a) the density function of the abscissa of the Ath point on the left; 

(b) the joint density function of the abscissas of the kih and mth points on 
the left (k < m). 

11 . A total of n independent trials are performed on a random variable g ha¬ 

ving a continuous distribution function, as a result of which the following values 
of the variable | were observed: Xx, Xg, Find the distribution functions 

of the random variables: 

(a) Tj„=max(Xi, Xg, .... x„); 

(b) C„=min(Xi, Xg, ..., x„); 

(c) the ftth largest observation result; 

(d) the joint distribution of the Mh'and mth largest observed values. 

• 12. The distribution function of the random vector (|i, |g, .... is Fi.Xx, 
Xi, Xfi)- As the result of a trial the components of the vector take on the 
values (Zi, z^, z„). Find the distribution function of the random variable: 

(a) T)„ = max(zi, Zg, z„); 

(b) g„ = min(Zi, Zg, z„). 

13. The random variable | has a continuous distribution function F(x). How 
is the random variable tj = F(|) distributed? 

14. The random variables | and t) are independent; their density functions 
are defined by the equations 

PE (x) = Pt)(x)—0 for X ^0 

pg (x) = CiX“e-^^, (x) = C 2 xTe”P-» for x > 0 

Find: 

(a) the constants Cx and Cg; 

(b) the density function of the sum |+t]. 

15. Find the distribution function of the sum of the independent random 
variables | and t], the first of which is uniformly distributed in the interval 
(— k, h), and the second has the distribution function Fix). 

16. The density function of the random vector (|, q, is 


/ 


6 


for X > 0, ^ > 0, z > 0 




p{x, y, z)= ■! 


(l+x+i/ + z)4 
0 


otherwise 



162 


Chap. 4. Random Variables and Distribution Functions 


Find the distribution of the variable 1+11 + 

17. Find the distribution of the sum of the independent random variables |i 
and la if their distributions are given by the conditions: 

(a) Fi (x) = Fa (x) = arc tan x; 

(b) uniform distribution in the intervals (—5, 1), (1, 5), respectively; 

1 

(c) PiW = Ps(*) = 2j« • 

18. The density function of the independent random variables | and i] is: 

j 0 for 

<a)P6W = P'>W=t for*>0(a>0) 


0 for X 

(b) = («)={ JL for 


;0, > a 


0 < xi 




Find the density function of the variable | = 

19 . Find the distribution function of the product of the independent factors 
I and Ti on the basis of their distribution functions F, (.«) and F* (x). 

20. The random variables | and tj are independent and distributed as follows: 

(a) uniformly in the interval (— a, a); 

(b) normally with parameters a = 0, a=l. 

Find the distribution function of their product. 

21 . The sides | and ti of a triangle are independent random variables. Using 
their distribution functions F|(jr) and Fy. (x) find the distribution function of the 
third side if the angle between the sides | and t| is equal to a constant num¬ 
ber a. 

22. Prove that if the variables | and q are independent and their density 
function is 

( 0 for a;<0 

P5W = P,(*) = ■( 

P 

then the variables S+q and — are also independent. 

23. Prove that if the variables | and q are independent and normally distri¬ 
buted with parameters ai = a 2 = 0, 0 ^ = 02 = a, then the variables 

and 4=1- 


are also independent. 

24. Prove that if the variables | and q are independent and distributed in 
accordance with the chi-square law with parameters m and n, then the variables 

6=-^ and |=|-Fq are independent. 

2^ The random variables h, I 2 .are independent and have one and 

the same density function 


pix) = 



(x-a)‘ 

2o* 


Find the two-dimensional density function of the variables 



Exercises 


163 


26. Prove that any distribution function possesses the following properties: 

OO 

lim x^ — dF (z) = 0, 

JC -»■ as J ^ 

X 
X 

lim X C — dF {z) — 0, 

X ->■ — » J ^ 

— ao 

27. Two series of independent trials are performed with a random variable | 
which has a continuous distribution function F {x). As a result, ^ took on values 
arranged in the order of increasing magnitude in each series: 


QQ 

lim a: C — dF (z) = 0, 
c -+■ + 0 J ^ 


lim X 


X 

It 


dF {z) = 0 


Xi ^ X2 ^ yi ^ ^ yjv 


What is the probability of the inequalities 

y^k ^ ^m+i ^ y^^.^i‘l 

where m and p, are given numbers (0 < m < M, 0<p< N)7 

28. The random variable | has a continuous distribution function F (x). As 
a result of n independent observations of we have the following values 
Xi < X 2 < ... < x„ that are arranged in increasing order of magnitude. Find the 
density function of the variable 

Fix„) — F{X2) 

^~Fix„)-FM 

29. The random variables | and tj are independent and identically distribut¬ 
ed with the density function 




C 

1-f 


e 

Find the constant C and prove that the variable — is distributed in accord- 
ance with the Cauchy law. 

30. The random variables ^ and t| are independent and their density functions 
are, respectively, given by 


Plix) 


1 

n y~ 1— x^ 


i\x\< 1) 


and p^(x) = 


{ 


0 

xe 


for 

il 

^ for X > 0 


Prove that the variable is normally distributed. 

31. Let I and ^ be independent and let them have the density functions 


Pi (X) ■- 

Prove that the relation 11 = 


0 


for x^O 
for X > 0 


1+^ 


Pc (^) = I 

s 

is distributed uniformly on the interval (0, 1). 


32 The random variables | and r] are independent and are uniformly distri¬ 
buted on the interval (—1, 1). Compute the probability that the roots of the 
equation x^-f-^ + q = 0 are real. 

(The problems from 29 to 32 were communicated to me by M. 1. Yadrenko.) 



CHAPTER 


5 

Numerical Characteristics 
of Random Variables 


In the preceding chapter we saw that the fullest description of a 
random variable is given by its distribution function. Indeed, the 
distribution function indicates at the same time both what values 
the random variable can assume and with what probabilities. How¬ 
ever, in a number of cases we need to know much less about the ran¬ 
dom variable, we want merely a general idea. Very important in the 
theory of probability and its applications are certain constant num¬ 
bers that are obtained in accordance with specific rules from the dis¬ 
tribution functions of random variables. Of these constants which serve 
to give a general quantitative description of random variables, of 
particular importance are mathematical expectation, variance and 
moments of various orders. 

Sec. 26. Mathematical Expectation 

We begin by considering the following schematic example: sup¬ 
pose that when firing from a certain gun, it is necessary to fire one 
shell with a probability of pi to hit the target, two shells with a pro¬ 
bability /? 2 i three shells with a probability p^, and so forth. Also, 
it is known that n shells are definitely sufficient to hit the target. 
We thus know that 

• • • +/?7i=l 

Now, how many shells, on the average, are needed to hit the target? 
We reason as follows. Suppose that a very large number of shots are 
fired under the conditions stated above. Then on the basis of the Ber¬ 
noulli theorem we can assert that the relative number of shots in which 
only one shell would suffice to hit the target is approximately equal 
to pi. In exactly the same way, two shells would require approximate- 



Sec. 26. Mathematical Expectation 


165 


ly 100/72% shots, and so forth.Thus, “on the average”, approximately 

1 -/ 7 i 4 - 2 -/ 724 - ... -rti-pn 

shells will be needed to hit one target. 

Similar problems involving the computing of the average value 
of a random variable crop up in a great diversity of problems. That 
is why a special constant, called mathematical expectation, is intro¬ 
duced into probability theory. We shall first give a definition for dis¬ 
crete random variables by proceeding from the foregoing example. 
Let 


denote possible values of a discrete random variable i, and let 

Pit Pzt • • • » Pnt • • • 

denote the corresponding probabilities. 

ec 

If the series 2 ^nPn converges absolutely, then its sum is 

n= I 

called the mathematical expectation (or, simply, expectation) of the 
random variable | and is denoted by M|. 

For continuous random variables, it will be natural to give the 
following definition: if a random variable | is continuous and 
p(x) is its density function, then the expectation of I is the in¬ 
tegral 

fAl^\xp{x)dx ( 1 ) 

in those cases when the integral 

^\x\p {x)dx 

exists. 

For an arbitrary random variable | with distribution function 
F(x), the expectation is the integral 

Nll-^^xdF (xr) (2) 

Taking advantage of the Stieltjes integral, we can give a simple 
geometrical interpretation of the notion of expectation: the expe¬ 
ctation is equal to the difference between the areas bounded by the 
^-axis, the straight line ^=1, and the curve y=F{x) in the interval 
(0, -f oo) and bounded by the x-axis, the curve y=F (x) and the //-axis 
in the interval (— cx>, 0). In Fig. 18 the appropriate areas are shaded 
and the sign is indicated that is to be affixed to the sum of each area. 
Let us point out, incidentally, that the geometrical illustration per- 



166 


Chap. 5. Numerical Characteristics of Random Variables 


mils us to write the expectation in the following form: 

0 00 

M| = — 5 F(x)dx+^{l—F{x))dx (3) 

— 00 0 

This remark makes it possible, in many cases, to find the expectation 
almost without any computations. For instance, the expectation of 
the random variable distributed according to the law given at the end 
of Sec. 22 is one half. 



Fig. 18 


Note that of the earlier considered random variables, the one dis¬ 
tributed in accordance with the Cauchy law (Example 5, Sec. 24) 
does not have any expectation. 

Let us now consider some examples. 

Example 1. Find the expectation of the random variable distri¬ 
buted according to the normal law 

From formula (2) we find 


The change of variables z — - reduces the integral to the form 


Ml 






ze ^ dz-\ — 7 =- \e ^ dz 
V 2j 


Since 


J ® dz = V2n and ^ze ^ dz — 0 


it follows that 





Sec. 26. Mathematical Expectation 


167 


We have obtained an important result that elucidates the prob¬ 
abilistic meaning of one of the parameters defining the normal 
law: in the normal law of distribution the parameter a is equal to 
the expectation. 


Example 2. Determine the expectation of the random variable % 
uniformly distributed in the interval (a, b). 

We have 



dx _ _ a-\-b 

b—a 2 (b—a) 2 


We see that the expectation coincides with the midpoint of the 
interval of possible values of the random variable. 


Example 3. Determine the expectation of the random variable I 
which is distributed in accordance with the Poisson law 

P{l = k)=-S^ (ft = 0, 1, 2, ...) 

We have 


ivii= 

k=0 


~ir~ 




Jk=l 


T" 





k=l 


(T=l) 



fe =0 


If F (x/B) is the conditional distribution function for a random 
variable i, then we will call the integral 

n{l/B)=]xdF{xlB) (4) 

the conditional expectation of the random variable | with respect 
to the event B. 

Let Bj, Bj, B„ be a complete group of mutually exclusive 
events and f(x/Bi), F(xlB^), F(xlB„) the conditional distri¬ 
bution functions of the variable | corresponding to these events. 
Let F{x) denote the unconditional distribution function of using 
the formula of total probability-, we find 

F{x)^tP(B,)F{x/B,) 

k=l 

Together with (4), this equation enables us to obtain the following 
formula: 

which, obviously, may be written differently; 

M5 = M{M(|/B*)} 


( 5 ) 



168 


Chap. 5. Numerical Characteristics of Random Variables 


The foregoing formula greatly simplifies in many cases the cal¬ 
culation of expectations. 

Example 4. A workman is operating n machines of one type 
arranged in a straight line at separations a from one another 
(Fig. 19). Assuming that the operator moves from machine to ma¬ 
chine in order of priority, find the average path length (the expec¬ 
tation of the path length) between machines. 



Number the machines from left to right, 1 to n, and denote by 
the event that the operator is at the ^th machine. Since all 
the machines are of the same kind, the probability that the 
next machine requiring the attention of the operator will be the 

ith is equal to — The path length K in this case is 

j (k — i)a for k'^ i 
~ t (/— k) a lor k< i 

By definition 

= 2(*—‘)a+ 2 (»■—*)o) = 

" V=l / 

a /k{k—l) . {n — k){n —^-f-l)'\ _ 

— n \ 2 ^ 2 

= ‘^l2k^—2{n~\-l)k + n(n-{-\)] 

The probability that the operator will be at kih machine is l/n, 
and so from formula (5) we find 

n 

X.-^[2&»-2(n+l)ft + n(n+l)] 

*=1 

We know that 

^ ^ (^~l~ 0 (2ra-|- i) 

kT! ® 

and so 

where l = (n —l)a denotes the distance between the end machines. 



Sec. 27. Variance 


169 


The expectation of an n-dimensional random variable (|i, I 3 , 
is defined as the collection of m integrals: 

cijjj = ^ ^ ^ Xff dF {^ 1 , • • • > Xf^^ • • • • -^n) ~ 5 ^dFff (x) = 

where Ff^{x) is the distribution function of the variable .* 


Example 5. The density function of a two-dimensional random 
variable (li, given by the formula (two-dimensional normal 

distribution) 


Pi^i, x^) 


2n(3xO^ — 


exp 


{ 


1 


2 (1-^2) I Of 


(jci—a)2 


2r(xi--a) {Xi—b) 


Find its expectation. 

By definition, 

o, = J ]x^p{x„ 

and 


a. 



OiOi 

Xj) ax, dXg = J XiPi (xj dx, 
Xa) dx, dXa -= J X 2 P 3 (Xg) dx. 


(Xa—6)n \ 

n 


In Example 2 of Sec. 23 we saw that 
Pi {Xi) = 




Ps W = 


,v^ 


exp 


I 


{Xi—a)^ \ 

2 a! f 

(x2-m 

2 o| / 


and so from the results of Example 1 of this section we find 

ai = a and a^ = b 


We have also been able to find the probabilistic meaning of 
parameters a and b for a two-dimensional normal distribution. 


Sec. 27, Variance 

The variance (also called dispersion) of a random variable | is 
defined as the expectation of the square of the deviation of | 
from M|. Let us agree to denote the variance by the symbol D|. 


* We do not give a formal definition of an n-dimensional Stieltjes integral, 
firstly, because we will actually only consider discrete and continuous random 
variables and, secondly, because probability theory does not require a general 
theory of Stieltjes integrals but the theory of the abstract Lebesgue integral 
(for more details see Chapter 1 of the monograph Limit Distributions for Sums 
of Independent Random Variables by Gnedenko and Kblmogorov, 1949, in Rus¬ 
sian). 



170 


Chap. 5. Numerical Characteristics of Random Variables 


Then by definition 

OD 

D| = M(|-Mi)»= (1) 

0 

where Fx^(x) denotes the distribution function of the random va¬ 
riable T) = (| — M|)2. 

For practical calculations, use is made of a different formula, 
namely 

Di=J(2 -M|)*df5(0) (2) 

That formulas (1) and (2) are equivalent follows directly from 
the following proposition. 

Theorem. If Fi (jc) is the distribution funotion of a variable | 
and f{x) is a continuous function^ then 

We shall confine the proof of this theorem only to the most 
elementary special case: f{x) — {x — a)^. Using the notation 

G(a:) = P{(|—a)* < x) 
we find by definition that 

CO 

M(|— aY==^xdQ{x) 

Mm QO 

If k is an odd number, then (|— a)^ is a nondecreasing function 
of \ and therefore 

G W = P {(i-a)* < x} = P {i-a < = 

= P{l<a+i/'x} = F{a+yj) 

Thus, for odd k, 

M(i— a)^ — ^xdF{a-\-^x) 

It is easy to see that by substituting z = a-i-i/x we reduce this 
integral to the form 

flO 

IVl(|—a)*= J {x — a)^dF{x) 

— 00 

But if k is even, then (i—a)* is a nonnegative quantity and, 
hence, G (a:) = 0 for jc ^ 0. For a: > 0 

0 W = P{fi—a)*<j<:}= P{a —^^x} = 

= F(a-\- Vx)—F {a — Vx + 0) 



Sec. 27. Variance 


171 


Thus, for even k 

to to 

M(|—fl)*= J xdF{a-{- ^x )— J xdF{a — ^x-}-0) 

n 0 

By the substitutions z = a-i-^x in the first integral and z = 
==a—^x in the second one, we reduce M(i—a)* to the form 

00 

M(|—a)*= J (;c— a)^dF{x) 

— 00 

For practical purposes, it is useful to write formula (2) in a 
different form. Since 

(2—Mi)’‘ = z’‘—22M5 + (M|)» and M 5 = Jzdfj(z) 

it follows that formula (2) may be written differently: 

D| = J 2>df|(2)-( j2<if|(z))’=M|>-(M|)« (3) 

Since the variance is a nonnegative quantity, from this relation 
we derive 

This inequality is a particular case of the well-known Bunyakov- 
sky-Cauchy inequality (also called the Schwarz inequality). 

Like expectation, variance does not exist for al 1 random var¬ 
iables. For instance, the Cauchy law which we considered earlier 
(see Example 5, Sec. 24) does not have a finite variance. 

Let us consider some examples in computing variance. 


Example 1. Find the variance of a random variable | uniformly 
distributed in the interval (a, b). 

In our example. 




b^-^-ab-^-a^ 

3 


In the preceding section, 

M| = 2^ 


Di= 


a^-\-ab-\- b^ 

3 



ib~af 

12 


was found. 
Thus, 



172 


Chap. 5. Numerical Characteristics of Random Variables 


We see that the variance depends only on the length of the 
interval (a, b) and is an increasing function of the length. The 
greater the interval of values which the random variable assumes, 
i.e., the more the values are scattered, the greater the variance. 
Variance is thus a measure of the spread or dispersion of the values 
of a random variable about the expectation. 

Example 2. Find the variance of the random variable I distri¬ 
buted in accord with the normal law 


We know that M| = a, and therefore 

D| = J (j:— a)” p (x) dx = —J (x—af e 


jx-a)* 

20 * dx 


Changing variables in the integral, put 


then 


DE = 



z^e 


£l 

2 dz 


Integrating by parts we find 


\ z'^e ^ dz 
And so, finally. 


00 


— 00 


ze 


+ J g dz 


= K2 


n 




We have thus found the probabilistic meaning of the second para¬ 
meter that determines the normal law. We see that the normal law 
of distribution is fully determined by the expectation and the variance. 
This is widely used in theoretical investigations. 

It will be noted that in the case of a normally distributed random 
variable the variance permits one to judge about the dispersion of 
its values. Though for any positive values of variance, a normally 
distributed random variable can assume all real values, still, the dis¬ 
persion of values of the variable will be the less, the smaller the vari¬ 
ance. Here, the probabilities of values close to the expectation will 
be greater. We noted this fact in the preceding chapter when we first 
examined the normal law. 

Example 3. Find the variance of the random variable % considered 
in Example 4, Sec, 26. 



Sec. 57. Variance 


173 


Retaining the notations of Example 4, we find 

M (XVB*) = 4 f £ (A «* + £ (i-kf aA = 

\t=l £=*+1 / 

= “4 [(A- 1)-A(2A-1) + (n-A) (n-A+ 1) (2rt-2A+ 1)] = 

= i [ 6 A^- 6 (n+ 1 ) A+( 2 «+l)(n+ 1 )] 

and, consequently, 

n 

fe=l 

= -^ (/z 4" 1) (2^ + 1) —3(n-|- \ Y n-]r 1) (2/i -1- 1 )] ~ — 1) 

From this it follows that 

D (X) = M -M (XP = i («»-1)= 

. a‘‘(n‘-i)(n‘^+2) _ P f, , 1,1 , __ 6 _\ 
18n2 18 \ 'n^n^'n^Xn —1)/ 

The variance (or the covariance matrix) of an n-dimensional ran¬ 
dom variable (li, I 2 , ...» |„) is defined as the set of n^ constants 
given by the formula 

5 . 5 (Xj—fMj) (Xj— M5j) dP (x„ ..., a:„) (4) 

{X^k^n, l^/^n) 

Since for any real tj{\^i^n) 

, n \ 2 n n 

S - -T lE d-Pixi, X, . 2 

V/=l / /=1 fe=l 

it follows that, as we know from the theory of quadratic forms, 
the quantities hk satisfy the inequalities 


bn 

bi2 • • • 

bik 

bn 

^22 • • * 

*.* 

bki 

bk2 • • • 

bkk 


^0 for 1, 2, ..., /I 


It is obvious that 

^kk — 


The quantities bjk for k^l are called the mixed central mo¬ 
ments of the second order of the variables %j and obviously, 




174 


Chap. 5. Numerical Characteristics of Random Variables 


= In the statistical literature, bj^ is often called the cova¬ 

riance oi Ij and I* and is denoted by ihe symbol cov (^y, |*). 

The following function of second-order moments 


r 


II— 


Vbiib// 


is called the correlation coefficient of the variables and |y. 

The magnitude of the correlation coefficient lies within the limits 
(— 1 » +!)• 

The correlation coefficient r^j assumes the values ±1 only when 
\j and are connected by a linear relationship. 

Indeed, since 


D 


( 


V bii 





it follows that —1 1. 

The equality rij= \ is possible if and only if 

-= 0 

Vbf,) 

Now the variance can be zero only for random variables which 
assume a certain constant value with probability one. Thus, if 


ij 


1, then 


li 


and, hence. 


VWt Vbj, 






In exactly the same way, if r,-y = —1 , then 


By straightforward computation it is proved that if random vari¬ 
ables are linearly related, their correlation coefficient is equal to plus 
or minus unity. 

It is easy to compute that for independent random variables and 
ly the correlation coefficient is zero. 

The converse conclusion is not true. The correlation coefficient 
of the variables | and t] may be zero even though they are dependent. 
To illustrate, suppose i is distributed symmetrically about 

the point x=0 and has a finite fourth moment. Then M|=0, JVI|t] = 
=M|3_o, consequently, fA(l — Mi) ("n— Mt))=0 and, hence, r^^=0. 
We can thus say that the correlation coefficient is a measure of the 



Sec. 27. Variance 


175 


strength of the relationship (linear relation) between the variables 
li and Ij. 


Example 4. Find the variance of the two-dimensional random 
variable (ii, I 2 ) distributed in accordance with the nondegenerate 
normal law 


p{x, y) = 


2110102 1 


xexp I 


1 


2(l-r2) 


(x —a)® 


o\ 


2f. (x—a)(y—b) ■ jy—b)^ 

0102 ® 


0 I 


II 


According to formula (4) and the results of Example 2 of this 
section and Example 1 of Sec. 26 we find 

D|,=al, D|2 = a| 

Further, 


^12 — ^21 


ix—a){y--b)p{x, y)dxdy = 

' ""t dyx 


2j10i02 V 1 


X J(a:— a) {y—b) exp I 


1 


2(1—r2)V a 


X—a 


y-b 


n 


dx 


By the substitutions z = —Tr2={- — - — r -—, t = -—- the 

\r\ — r^\ai oTg y ’ 02 

expression for big is reduced to the form 

K = *« = 'ST II V\—r^ tz + ro,o ,<»)dzdt = 

= ^\ t^e ^ dt \e ^ dz + 

+ I ^ dt{ ze ^ dz = r<T,o, 

Whence we find 


^ ^ (Jt-a) (y-6) P {X, y)dxdy ^ (S,-M|a) 

V D|. 

Summarizing, then, the parameter r of a two-dimensional normal 
distribution is the correlation coefficient of the components (|i, ^ 2 )- 
We see that the two-dimensional normal laWy like in the one-dimen¬ 
sional case, is completely determined by specifying the expectation and 
the variance; that is, it is determined by specifying five quantities: 
Mil, MI 2 , Dll, DI 2 and r. 



176 


Chap. 5. Numerical Characteristics of Random Variables 


Sec. 28. Theorems on Expectation and Variance 

Theorem 1. The expectation of a constant is equal to that constant. 

Proof. We can regard the constant C as a discrete random variable 
which can take on only one value C with probability one; and so 

MC=C-1=C 

Theorem 2. The expectation of a sum of random variables is equal 
to the sum of their expectations: 

M(|+Tl) = iVl| + jVlTl 

Proof. First consider the case of the discrete random variables 
% and T]. Let au aa, ..., a„, ... be possible values of the variable | 
and Pi,••• he the probabilities of these values: bi, b^, 
..., bkt ... are possible values of the variable t] and q^ q^, ..., qh, 
... are the probabilities of these values. The possible values of the 
variable |+t) have the forma„+feh(^, n=l, 2, ...). Denote by p^h 
the probability that | will assume the value a„ and by t), the value 
6ft. By the definition of expectation, 

(X 00 00 

n, k= 1 n=: I k= 1 

ce /* \ ® /* \ 

= 2 a»{ 2 P»») + 2 6*( 2 Pnk) 

n=\ \*=i / k~i \/i=i y 


Since by the theorem of total probability 


it follows that 


and 


OD 




2 P»» = P» and 2 P«* = 9* 

/e=! n=l 


OD CO 


OD 


2 a„ 2 Pnk= 2 a»P„ = M| 

n=I k=i n=l 


00 


CO 


2^2 Pnk = 2 PkQk = M’l 

k=l n=l k=l 


The proof of the theorem for the case of discrete summands is com¬ 
plete. 

The same applies to the case when there is a two-dimensional den¬ 
sity function p{x, y) of a random variable (|, t]); from formula (3) 



Sec. 28. Theorems on Expectation and Variance 


177 


of Sec. 24 we find 

MC = M 4-T]) J xdF^{x)— J a:( Jp( 2 , .jf— z)dz^dx^ 

xp{z, x~z)dzdx= {z-Fy)p{z, y)dzdy~ 

^\\zp{z, y)dzdy+\\yp(z, y)dzdy = 

= ]zp^{z)dz+^yp^{y)dy = Ml + Mr\ 

Theorem 2 will be proved for the general case in Sec. 29. 

Corollary 1. The expectation o/ the sum of a finite number of 
random variables is equal to the sum of their expectations: 

4" Sa 4” • • • 4- Sn) “ 4" Ml, 4~ • • • 4“ M|„ 

Indeed, by virtue of the theorem we have just proved 

M (li 4" Is + • • • +1«) ” M|| 4’ M (1,4- Ig 4~ »• • + In) — 

= M|i 4- 4- M (Ig + • • • 4- In) =•••== M|j + M|g 4~ • • • 4- 

Corollary 2. Consider the sum 

Ijj. = li + la + • • • + l|A 

where p is a random variable that takes on only integral values^ 
the random variabT'^ li, I,, ... do not depend on p, the expectation 
of p is finite and the series 

fe=l 

cmverges; the expectation of the sum exists and is equal to 

= 2 MlyP {y. > /■} 

/= * 

Proof. Indeed, provided that p = ^, the conditional expectation is 
M {yp ^k} = Ml, 4- M|,4- ... 4- mi. 

The unconditional expectation is 




178 


Chap. 5, Numerical Characteristics of Random Variables 


If the summands I3, ... are identically distributed, that 

is, if P {^1 < = P {I2 < = • •. = ^ W, then 

Indeed, 


03 


CO 


M?, = S p {(i = *} 2 M5/ = 2 AP {n = = M5. ■ Mu 

fe=I /=1 fe=l 


Example !. The number of cosmic particles striking a given area 
is a random variable \i that obeys the Poisson law with parameter a; 
each of the particles carries energy I that depends on chance. Find 
the m.ean energy ^ acquired by the area in unit time. 

According to Corollary 2 we have 




Example 2. A target is fired at and hit n times. Assuming that 
the shots are fired independently of one another and the probability 
of a hit in each shot is p, find the expectation of shell consumption. 

Denote by the number of shells expended from the {k —l)st 
hit to the ^th hit. It is obvious that the consumption of shells 
in n hits is 

E = + I 2 + • • • 4“ 

and, consequently. 


But 

and 




consequently. 


00 

Mii = X V'‘p=7r^= 

fe=0 ^ 


p 



Theorem 3. The expectation of the product of the independent ran- 
dom variables I and t] is equal to the product of their expectations. 

Proof. If the variables I and t] are discrete, ai, a^, ..., Gh, ... are 
the possible values of ^ and pu P2, ..., ph, ... are the probabilities 
of these values, bi, 6 a, • • •, • • • are the possible values of t] and 

Qu <72, . • < 7 n, • • • are the probabilities of these values, then the pro¬ 

bability that I will assume the value ah and t] will assume the value 



Sec. 28. Theorems on Expectation and Variance 


179 


K is By the definition of expectation 


00 00 


M5t1 = S = S S aAPk<ln 

k, n k= \ n=\ 


00 


= ( S akPk 


ee 

S t>„q„ 

<n=^l 


= MEjViY| 


Only slightly more complicated is the proof for the case of conti¬ 
nuous variables. This is left to the reader. 

Proof of the theorem in the general case will be taken up in 

Sec. 29. 

Corollary I. A constant factor may be removed from under the 
sign of expectation: 

MCI = CM| 

This assertion is obvious, since no matter what the variable |, 
the constant C and the variable | may be regarded as independent 
variables. 

Theorem The variance of a constant is zero. 

Proof. According to Theorem i, 

DC = M(C—MCf = M(C—C)2 = M0 = 0 


Theorem 5. If C is constant, then 

DC| = C^DI 

Proof. By virtue of the corollary to Theorem 3, 

DC| = M [C| -MCI] 2 = M [C|- CM|] ^ = 

= MC2 [I -Ml] 2 = C^M [I - Ml] * = C^DI 

Theorem 6. The variance of the sum of the independent random 
variables | and t] is equal to the sum of their variances: 

D(| + Ti) = D|-fDTi 

Proof. Indeed, 

D (| + ti)==M [l + Ti—M(|-f'n)]* = M [(E—M|) + (t]—M' n)]2 = 

= D| + Dt] -f 2M (|-M|) (ti-Mti) 

The variables | and r\ are independent, and so also independent 
are the quantities |—M| and q—Mq; whence 

M (I -Ml) (q -Mq) = M (| -M|) M (q -Mq) = 0 

Corollary 1. // li, I 2 , ..., |„ random variables, each of which 
is independent of the sum of the preceding ones, then 

D(li + i2 +...+y = D|,-f D|,+ ...-f D|„ 



180 


Chap. 5. Numerical Characteristics of Random Variables 


Corollary' 2. The variance of the sum of a finite number of pair¬ 
wise independent random variables Ig* • • •» equal to the 

sum of their variances. 

Proof. Indeed, 




n n 


n n 


=m 2 2 M5/)=i: 

/=! fessl fesrl /=! 


n 


= 2 D5,+ S M(5,-My (L—MS,) 


k=\ 


! 


From the independence of any pair of variables and {k^j) 
it follows that for k^j 

This quite obviously completes the proof. 

Example 3. The ratio 

I-me 

V *>| 


is called the standard deviation of a random variable. Prove that 


D 


1-M| \ _ . 

V'm) 


Indeed, | and ME, considered as random variables, are indepen¬ 
dent and for this reason, by virtue of Theorems 5 and 6, 


D 


D|+ P (-Ml) „ I 

Vm J m ' 


Example 4. If | and q ire independent random variables, then 

D(| —q) = DE-f Dq 

Indeed, by virtue of Theorems 6 and 7, D(—q) = (—l)^Dq = Dq 
© (I—q) = D| 4- Dq • 


Example 5. Theorems 2 and 6 permit of an extremely simple 
computation of the expectation and variance of the number p of 
occurrenees of an event A in n independent trials. 

Let pk be the probability of an occurrence of the event A in the 
kth trial. 

Denote by the number of occurrences of the event A in the 
kih trial. It is obvious that is a random variable that takes on 
values 0 and I with probabilities q^^^—Ph respectively. 



Sec. 28. Theorems on Expectation and Variance 


181 


Thus, the variable may be represented in the form of a sum: 


Since 


M- — ^1 + M'S + . • • 4- M'n 


Djij = Mul—(M hj)= = 0 • + 1 • pt—pi = Pj (1 ~pj) = 

the proved theorems permit concluding that 

Mp = Pj + pg + ; .. + p„ 

and 

Dp = p,9, + p,p,+ ...+p„p„ 

For the case of the Bernoulli scheme, p* = p and, hence, 

Mp = np and Dp = npq 
We then note that this gives 


M-t = p, DJi- = -£2. 

Example 6. Let us find the expectation and the variance of the 
number of occurrences of an event E in trials connected in a ho¬ 
mogeneous Markov chain. 

As before, we denote the number of occurrences of the event E 
in the ^th trial by p;^. The number of occurrences of the event in 
n trials is equal to the sum 

M- = M'l + + • • • “h 1^« 

But 

M|i = J] Mpj = J; Pj 


According to formula (!') of Sec. 20, 

P 4 = P + (Pi—p) 6 *"‘ 

Thus, 

n 

M(i = np + ]^(p,—p)6*-> = /ip+(pj—p) 


*=1 


1-6 


By definition, 

Dp = M(p—Mp)2 = M 


2 (p-A Pk) 


k=i 


n 


= 2 M (f** — P*)‘ + 2 S M ([»/— Py) (Py—Py) 

ftxsl />< 



182 


Chap. 5. Numerical Characteristics of Random Variables 


But 

= p*9» = P9 + (?—p) (Pj — p) 6 *-‘—(Pi — 

where 

9 *= 1 —P» = ‘( —(Pi—P) 6 *'' 

Further, 

M ((*/—Pi) (P/—P/) = MpiPy —P,P/ 

Since the probability of the equality = 1 is obviously 
it follows that 

M (Pi—Pi) (P/ —P/) = Pi (Pf —Py) 

Taking advantage of formulas (!') and (2') of Sec. 20, we find 

M (Pi—Pi) (P/—P/) = P<?S^‘'' + (Pi—p) (q—p) 6/-‘—(Pi— 

Now 

n 

X Dp* = rtp9+(9—p)(p,—P)4Ey—(A—P)"7E^ 

1 

and 

X ^ (V'l—Pi) (Pi—Py) = «P9 T^e—PPrza (• +1^) + 

/>»■ 

f i_6 V 1-6 ; (1—6) (1—62) 

Thus, 

1 ~h 6 I 

•^P = "P9I^8 + “» 

where a„ is a certain quantity that remains bounded as n increases. 

Sec. 29. Mathematical Expectation Defined 
in the, Axiomatics of Kolmogorov 

This section should be omitted in a first reading, as it requires 
extended knowledge in the theory of integration. The general con¬ 
ception presented here is a natural development of the construction 
of concepts of a random event, of probability, and of a random va¬ 
riable as given by A. N. Kolmogorov (see Secs. 9 and 21). In this 
interpretation, the concept of expectation leads naturally to the abs¬ 
tract Lebesgue integral. 

By definition, the expectation of a random variable l=f{e) is 
the integral 

= 5 / (e) P (de) 
u 



Sec. 29. Mathematical Expectation Defined in the Axiomatics of Kolmogorov 183 


Under the hypothesis B, the conditional expectation is 

M{|lB)=5/(e)P(<te|i5) 

u 

It can readily be proved that this definition is equivalent to the 
following 

Md 1B) = J / (e) P (de) 

B 

which is often better suited to practical employment. 

Let it be remarked that if an event B is representable as the 
sum of a finite or countable set of disjoint events Bf^: 

B = B^Ar.. . 

then 

U{e)P{de) = '^lf(e)P{de) 

B * Bfc 


It is useful to note that whereas previously the proof of this 
theorem on the expectation of a sum required rather lengthy reason¬ 
ing, now the theorem is a consequence of the formula 

l{f + g)P{de)=lfP{de) + l^ide) 

For the independent random variables I and q we previously 
proved the formula 

Ma-q) = M|.Mq (1) 


only in the case of discrete random variables and in the case of 
continuous random variables. 

In the general case, let us define the discrete random variables 
and rj„ by the formulas 


ln = — for 

n 


<£L±i 



for 



fe-l-i 

n 


Then 

From well-known theorems on passing to the limit under the 
sign of the Lebesgue integral we can readily derive that 

lim = Ml, lim Mq„ = Mq, lim M • q„) = M (| • q) 

Thus, formula (1) is proved in the general case. 



184 


Chap. 5. Numerical Characteristics of Random Variables 


We shall use the results obtained to derive a formula that gene¬ 
ralizes the result of Sec. 28 (Corollary 2). This formula will be 
obtained from the following theorem proved by A. N. Kolmogorov 
and Yu. V. Prokhorov. 

Given a sequence of random variables 

El* ^2» • • • » • • • 

let 

Cv ~ ll “f* I 2 4* • . ‘ + Iv 

denote the sum of the first v variables, the number of the sum¬ 
mands V itself being a random variable. 

Denote by the event that v—m and put 

/>„ = P{v>n}= i; p„ 

m~n 

Theorem. If for n'>m the random variable and the event 
are independent, there exist the expectations 

an = Min 

(and, hence, the quantities c„==M||„| are finite), and the series 

CO 

n=l 

converges, then it follows that the expectation of the variable exists 
and is equal to 

CO 

S Pn\ 

where 

M-ln ~ 4 “ ^2 4 “ • • • 4 “ 

Proof. By virtue of the assumptions that have been made 

00 CO 

2 PnAn = 2 Pa 

n-\ n~l 

Since does not depend on the event {v < n}, it does not 
depend on the contrary event {v'^n} either, and for this reason 

= M|„ = M {|„/v ^ n) 

Taking into account the equations that have just been written 
and also the earlier given properties of conditional expectations, 



Sec. 30. Moments 


185 


we can write the following sequence of equalities: 


00 w (ja n 

= 2 P{v>n>M{|„/v>n}=2, J ,I„PW 


n = l 


n= 1 


CD 00 i- 

= 2 2 , J .l„P(*) 

/ia= 1 m = n ^v=ml 


And since the variable ||„[ and the event {v^/x} are also inde¬ 
pendent, it follows that 


2 2 

n-l m-n 


S l„P(<te) <22 S|l»|P(dc) = 

|v=m} n=^\m-nSm 

2, S , l5«|P(dc)= 2 P{v>n}M{||„|/v>n} = 


/!=: 1 
QD 


GO 


= 2 P{'’>«}M|i„i = 2^„c„< + «> 

n=l rt=l 


The estimate just obtained permits us to write the equation 

2 2 S l„P(<i«)= 2 2 I i„P(de)= 2 S £nP(de) 

n= \ m=n Sm m=l n=l S»i m=l Sm 

Since 

MS.= 5£,P(de)= i h„P(d«) 

m = l Sot 

it follows that the preceding equation proves the theorem. 

Corollary, /f in the preceding theorem we put a~a^ = a^ = 
then 

CO 

MCv = aMv = a 2 ^Pn 


Sec. SO. Moments 

The expectation of the variable (|—a)* is called the moment of 
the kth order of the random variable 


= ( 1 ) 

If (2 = 0, the moment is called the kth moment about the origin. 
It is readily seen that the first moment about the origin is the 
expectation of the variable 

If a = M|, then the moment is called the central moment. It is 
easy to see that the central moment of the first order is zero, 
while the central moment of the second order is the variance. 



186 


Chap. 5. Numerical Characteristics of Random Variables 


We will denote the moments about the origin by the letter V;^, 
and the central moments by the letter the subscript in both 
cases indicating the order of the moment. 

For the second moment V 2 (a) we have the obvious equality 

v, {a) =fA{l—af = f\ (l—mf + {CL—Mlf 

from which we conclude that the second moment Vf^ (a) has a least 
value when a=mt 

There is a simple relationship between the central moments and 
the moments about the origin. Indeed, 

n„ = M = i: C* {-Mi)-‘Ml* = i: c* (- MD-^v* (2) 

fe =0 k =0 

Since Vi = M|, it follows that 

H»= il (- l)'-*Civ»vr* + (-l)'-Mn-l)(v,)'- (3) 

ft = 2 

Let us write down the moment relations for the first four values 
of n: 


Po — 

Pi = 0, 

= — < 

P3 = V3—3v2Vi + 2vf, (30 

P4 = “^4 — 4 V3V, -f- Gvgvf — 3 vt 

These first moments play a particularly important role in statistics. 
The quantity 

= a|* (4) 

is called the absolute moment of the ^th order. 

According to Theorem 1 of Sec. 27, 

Vk{a)=l{x—afdF{x) (5) 

Since we agreed that the random variable ^ has expectation only 
when the integral depicting it converges absolutely, it is clear that 
the k\h moment of the variable ^ exists if and only if the integral 

f I a: p flfFt (x) 

converges. From this remark it follows that if a random variable | 
has a moment of the order, it also has moments of all positive 
orders less than k. Indeed, since for r </?, |P > 1P, if |a:|> 1 , 



Sec. 30. Moments 


187 


it follows that 

dF^{x)= [ {xY dF^{x)-\- 5 IxYdF^ix)^ 

1X r< 1 I > 1 

< \ \xYdF^(x)+ \ \x\’‘dF^(x) 

I^|<1 ui>l 

The first integral on the right-hand side of the inequality is 
finite by virtue of the finiteness of the limits of integration and 
the boundedness of the integrand; the second integra 1 converges by 
assumption. 

Example. Find the central and the central absolute moments of 
a normally distributed random variable: 




We have 




For odd k, since the integrand is odd, 

(‘* = 0 

For even /?, , 


00 




= /n*= j x^i 


dx 


By the substitution x^ = 2z we reduce the integral to the form 




— r — /■y — 

= mf, = y -~a*2 2^22 e^^dz= }/ ^or*2 2 r 




= o^{k — 1 )(^— 3 ) ... 1 = 0 ^ 


k\ 


2*72 


(I)' 


When k is odd, the absolute moment is 


00 


«*= /^ r(^) = 

0 

= /l 

The moments of distribution functions cannot be arbitrary quan¬ 
tities. Indeed, no matter what the constants ..., the 


fe -i 

2 fk —\\. u 
o'® 



188 


Chap. 5. Numerical Characteristics of Random Variables 


quadratic form 

^ = S f i] ^ (x—a)*) dF(x) =22 (a) 1^1, > 0 

\ft = 0 / /=0 *=0 

is nonnegative; for this reason, the first Vy(a) should satisfy the 
following inequalities: 

Vo (a) Vi(a) ... v*(a) 

Vi(a) v^(a) ... v*+i(a) 


^kio) ••• 

The absolute moments obey analogous inequalities. 

Concerning absolute moments we prove the following theorem. 

Theorem. If a random variable | has an absolute moment of or¬ 
der k, then for any t and T{0<t<x<k) 

V ftit^ V m^ 

where 

mi=fA\\—aY 

and a is any real number. 

Proof. First we prove the theorem for the case when t, t and k 
are rational numbers. For the sake of definiteness, let 



and, by hypothesis, 

p<s<u 

Now let r be some positive integer less than u. We consider the 
nonnegative quadratic form 

r idL £±11 * 

mr-i u^-{-2mr uv-\-mr.^\ ^ \u\x\^^-\-v\x\^^ df{x) 

q ~q q 

The condition that it be nonnegative is, as we know, that 

q q q 

This inequality may obviously also be written as 

mf ^ mr-i ^r+i 
~q ~ ~T 





Sec. 30. Moments 


189 


If we assign to r a succession of values from 1 to r, we get 
a sequence of inequalities: 

m\ ^ m^m § , 

T T 

m 2 1 rus , 

T T T 


mV ^mj_i 
T <7 ~T” 

Note that is always equal to 1. Then, multiplying these in¬ 
equalities together and cancelling, we arrive at the inequality 

m_^ ^tUr+i 

T ~~7~ 

Thus, 

1 1 

tUr <mrti 

T Q 


nir ^ tn^r + i 

T <7 

This inequality quite obviously proves the theorem for the case 
of t, T and k being rational. 

Since the function is continuous with respect to the argument 
t in the region passage to the limit will convince us 

that the theorem holds for any t, t and k. 

Note that the foregoing theorem contains the following impor¬ 
tant property of moments: 

J_ J> J- ^ 

I ^ ^ } ^ • • • 

In the examples of the preceding sections, the first two moments 
of a random variable fully determined its distribution function if 
the type of function was known beforehand (this occurred in the case 
of the normal, Poisson, uniform and other distributions). A substan¬ 
tial role in mathematical statistics is played by distribution laws that 
depend on more than two parameters. If it is known beforehand that 
a random variable obeys a definite kind of law and only the values of 
the parameters are unknown, then these parameters are determined 
in the most important cases by the first moments. But if we do not 
know the type of distribution function, then, generally speaking, 
not only a knowledge of the first moments but also of all integral 




190 


Chap. 5. Numerical Characteristics of Random Variables 


moments will fail to determine the unknown distribution function. 
It is possible, it turns out, to construct examples of distribution fun¬ 
ctions with identical moments of all integral-valued orders. This brings 
forth the following problem (the problem of moments): given a sequence 
of constants 

Cq— I , Cif ^2, C3, ... 

(1) Under what conditions does there exist a distribution function 
Fix) such that for all n the following equation is valid: 

c„ = \ x^dF (x)? 


(2) When is this function unique? 

This problem has been completely solved, but we shall not go into 
the solution for it would take us outside the scope of this book. 

We shall define some more numerical characteristics of random va¬ 
riables that are frequently used in theory and applications. 

The median of a distribution Fix) is that value of the argument m 
for which the following inequalities hold: 

F (m) ^ y ^ (m + 0) 


If F (x) is continuous, there exists at least one m for which the equa¬ 
tion 


f (m)=0.5 


is valid. If the curve y=F ix) and the straight line ^=0.5 have a 
common closed interval, then any point of this interval is the median. 

The median exists for all distributions, but the expectation may not. 

Note that the median has the following property: 

Theorem. The absolute moment M || — c\for a continuous distribu¬ 
tion F ix) assumes a least value if c is chosen equal to the median of the 
distribution. 

Proof. The theorem follows immediately from the following easily 
verifiable equality: 


' c 

M||— /nl + 2j(c— x)dF(x) ife>m 

M||-c| = | 

' ' m 

iVl||— m\-\-2^{x — c)dF(x) iic<m 

. C 

inasmuch as the second term in both cases is positive for c=^m. 

The median of the normal distribution is equal to its mean (expe¬ 
ctation). 



Exercises 


191 


Just as we defined the median, we define for any number 
p{0<ip<.\) the distribution quantile of order p. We confine ourselves 
to the case of a continuous distribution. Any root of the equation 
F (x)=p is called the quantile of order p. Clearly, the median is the 
quantile of order Vz. If in a distribution the quantiles are known for 
a large number of values, say for p=0.1; 0.2; ...; 0.9 (these quantiles 
are called deciles), they give a sufficiently complete idea about the 
peculiarities of the distribution. 

If a random variable is continuous, i.e. its distribution function 
has -a density, then the value of the argument for which the density 
is a maximum is called the mode of the distribution. For the normal 
distribution, the mode coincides with the median and the expectation. 

Of the other numerical characteristics, most essential are semi¬ 
invariants (or cumulants), which will be defined in Chapter 7. For 
the present we note the following. In the addition of independent 
random variables the moment of a sum is, generally speaking, not 
equal to the sum of the moments of the summands. For the mo¬ 
ment of the sum of the independent variables | and q we have 
the equation 

M (1 +11)” = 2 C*M|*IVlti”-» 

*=0 

Cumulants of different orders have tne property that upon addi¬ 
tion df the independent terms the cumulant of the sum is equal to 
the sum of the cumulants of summands of the same order. It turns 
out that the cumulant of any order ^ is a rational function of the mo¬ 
ments of orders less than or equal to k. 


EXERCISES 


1. A random variable | takes on only integral nonnegative values with pro¬ 
babilities 

oh 

(a) P (E=i<f) = 7 i—;—, a > 0 is a constant (this is the Pascal distribution). 

(1 -\-a)^'*‘^ 

(b) Pi = P{l=A} = (t+^)* - ■ * Pofor 4 > 0 where 

a > 0, X > 0 and 


Po = P{i = 0} = (l-|-aX) « 


This is the Polya distribution. 

Find and D^. 

2. Let fi be the number of occurrences of an event A in n independent trials, 
in each of which P(A) — p. Find 

(a) Mp3, (b) MpS (c) Mlp-npl 



192 


Chap. 5. Numerical Characteristics of Random Variables 


3. The probability that event A will occur in the iih trial is pi. Let p, be 
the number of occurrences of A in the first n independent trials. Find 


n 


3 


(a) Mp, (b) Dp, (c) m|^p —j ^nd (d) M|^p —2 Pi 
4. Prove that, given the conditions of the preceding problem. Dp reaches 


n 

a maximum for the given value of ^ — provided 


Pi = p2 


1 

Pn 


= p„=a 


5. Let p be the number of occurrences of an event A in n independent trials, 
in each of which P(A) = p. Also, let a variable t] be 0 or 1 depending on 
whether p proves to be even or odd. Find Mi]. 

6. The density function of a random variable i is 


P{x) 


1 - 
2a ^ 


x-a 

a 


(the Laplace distribution). Find M| and D|. 

7. The density function of the absolute speed of a molecule is given by the 
Maxwell distribution 

p {x) — j ■ a for x > 0 


a*’ 


n 


and p (x) = 0 for (a > 0 is a constant). Find the average speed of a mole¬ 

cule, its variance, the mean kinetic energy of the molecule (the mass of the 
molecule is m) and the variance of the kinetic energy. 

8. The probability density that a molecule in Brownian motion will be at 
a distance x from a reflecting wall at time t, if at time 4 it was at a distance Xq, 
is given by the formula 

1 / -_l£±£®)! (jc-xp)* I 

^ 4Di I 


p(x) 


2 

0 


for 


0 


for X < 0 


Find the expectation and the variance of the magnitude displacement of the 
molecule during the time from I — Iq to t. 

9. Prove that for an arbitrary random variable |, the possible values of which 
lie in the interval (a, b), the following inequalities are valid: 

and 

10. Let Xi, Xg, ... , x^ be possible values of a random variable |. Prove that 
as n —► 00 

(^) -(t)) ► max xj 


M|« 


i<l<k 


1 < / < fe 


11. Let F(x) be the distribution function of Prove that if M| exists, then 


[l-f (*)+f 

0 



Exercises 


193 


and for the existence of it is necessary that 

lim xF {x)= lim x[i--f(x)] = 0 

X-* — OO X -*■ QD 

12. Two points are dropped at random on the line-segment (0, 1). Find the 
expectation, variance and the expectation of the nth power of the distance 
between them. 

13. A random variable | is distributed according to the logarithmic normal law; 
i.e., for x > 0 the density function of | is 


p{x). 


X^ V 2 : 


2 ^* 


(In x-a)* 


Jt 


(p(x) = 0 for x<0). Find Mi and Di. 

(A. N. Kolmogorov has demonstrated that particle sizes in crushing obey the 
logarithmic normal distribution law.) 

14. A random variable | is normally distributed. Find M||— a \ where a = M|, 

15. A box contains 2« tickets; the number i(i = Q, 1, ... , n) is written on 
of them. A total of m tickets are drawn at random, s is the sum of the 

numbers written on them; find Ms and Ds. 

16. The random variables ii, I 2 , ... , In+m iF > ^re independent, identically 
distributed and have a finite variance. Find the correlation coefficient of the sums 


s—I1+I2+• • •+lrt and <y==lj«+i+i/»+2+•••+!/»+» 


17. The random variables | and t] are independent and are normally distributed 
with the same parameters a and a. Find the correlation coefficient of the quan* 
tities a|-|-Pq and a|~pq, and also their joint distribution. 

18. A random vector (|, rj) is normally distributed; M| = a, Mq = &, D| = <j| , 

DT| = a|, and R is the correlation coefficient of | and q. Prove that R = c05qii 
where <7 = P {(|--a) (q— 6 ) < O}. 

19. Let Xi and Xg be the results of two independent observations of a nor¬ 


mally distributed variable Prove that M max(xi, X 2 ) = a-1— 7 = where 

y lx 

a = ML a2 = D|. 

20. A random vector (|, q) is normally distributed, M| —Mq~0, D| = Dq~l, 
M|q = /?. Prove that 


Mmax ( 



-R 

jt 


21. The unevenness in length of cotton fibre is given by 

a 


where a is the expectation of fibre length, a" is the expectation of lengths of 
the fibres longer than a, and a' is the expectation of lengths of fibres shorter 
than a. Find the relation between the following quantities: 

(a) X, a, M II—a \; 

(b) K, a and-a if | is normally distributed. 



194 


Chap. 5. Numerical Characteristics of Random Variables 


22. The random variables |i, • • • . • • • are independent and uniformly 

distributed over (0, 1), Let v be a random variable equal to the k for which 
the sum 

Sft = + ^2 + • • • + in 

exceeds 1 for the first time. Prove that Mv = e. 

23. Let I be a random variable with density function 

Find M min (| E |, 1). 

(Problems 22 and 23 were communicated to me by M. I. Yadrenko.) 



CHAPTER 


6 

The Law of Large Numbers 


Sec. 31. Mass-Scale Phenomena and the Law 
of Large Numbers 

The vast experience accumulated by mankind teaches us that phe¬ 
nomena which have probability extremely close to unity almost 
definitely take place. Conversely, events the probability of occur¬ 
rence of which is very small (close to zero, in other words) occur very 
infrequently. This circumstance plays a basic role in all practical 
conclusions from probability theory, for this experimental fact enables 
us in practical activities to consider events that are highly improba¬ 
ble to be practically impossible, and events that occur with probabili¬ 
ties close to one as practically certain events. And yet we are not able 
to give an unambiguous answer to the very natural question: what 
must the probability be so that we can regard an event as practically 
impossible (practically certain). This is quite natural, since in pra¬ 
ctical affairs one has to take into account the importance of the events 
we deal with. 

For instance, if in measuring the distance between two villages 
it were found equal to 5,340 metres and the error of this measurement 
were equal to or greater than 20 metres with a probability 0.02, we 
could neglect the possibility of such an error and consider that the 
distance is indeed equal to 5,340 metres. Thus, in our case we consider 
the event having probability 0.02 as of practically no importance and 
disregard it in our practical work. Yet in other cases one cannot neg¬ 
lect probabilities of 0.02 and even less. To illustrate, suppose, in 
the construction of a large hydroelectric power station that requires 
enormous expenditure of materials and manpower, it were found that 
the probability of a catastrophic flood level were equal to 0.02 under 
the conditions at hand, then this probability would be considered 
high and it would have to be taken into account in the designing of 
the station and not neglected, as in the earlier example. 

Thus, only the demands of actual practice can suggest the criteria 
according to which events are to be regarded as practically impos¬ 
sible or practically certain. 



196 


Chap. 6. The Law of Large Numbers 


At the same time we must note that any event having positive 
probability, no matter how small, can occur, and if the number of 
trials in each of which it can occur with one and the same probability 
is very great, then the probability of at least a single occurrence may 
be arbitrarily close to unity. This circumstance should be constantly 
borne in mind. However, if the probability of some event is very small, 
then it is exceedingly difficult to expect its occurrence in some trial 
specified beforehand. Thus, if somebody asserts that in the first deal 
of cards between four players each will receive cards of only one suit, 
then it is natural to suspect that the dealer had certain things in mind, 
say definite order of the cards known only to him. This confidence is 
based on the fact that the probability of such a deal, given well shuf¬ 
fled cards, is equal to (9!)H!/36! <C 1.1 X 10“^®, which is extraordina¬ 
rily small. Be that as it may, it is on record that cards have been dealt 
in that way. This instance is a sufficiently good illustration of the 
difference between the notion of practical impossibility and categori¬ 
cal, so to say, impossibility. 

From what has been said it is clear that in our practical activities, 
and in general theoretical problems as well, events with probabilities 
close to unity or zero are of great importance. It is thus clear that 
one of the principal problems of probability theory should be the es¬ 
tablishment of regularities involving probabilities close to unity; 
here, a particular role should be played by laws that arise due to the 
superimposition of a large number of independent or weakly depen¬ 
dent random factors. The law of large numbers is one such proposition 
of the theory of probability and the most important one. 

It would now be natural to regard the law of large numbers as de¬ 
fining the entire assemblage of propositions asserting with probabi¬ 
lity arbitrarily close to unity that some event will occur that depends 
on a boundlessly increasing number of random events, each of which 
exerts on it only a slight effect. 

This general conception of theorems akin to the law of large num¬ 
bers may be formulated somewhat more definitely. Let there be given 
a sequence of random variables 

El» ^2* (1) 


We consider the random variables Cn which are certain specified 
symmetric functions of the first n variables of the sequence (1): 

^n~fn (^it i2j • • • j in.) 

If there exists a sequence of constants au ^ 2 , ..., a;i, ... such that 
for any e>0 


lim P{lC;,-"a„|<e}=l 


(2) 



Sec. 31, Mass-Scale Phenomena 


197 


then the sequence (1) obeys the law of large numbers with given 
functions /n- 

Ordinarily, however, a much more definite meaning is ascribed 
to the concept of the law of large numbers. Namely, we restrict our¬ 
selves to the case when is the arithmetic mean of the variables |i, la, 

• •• j En* 

If all the quantities an in relation (2) are equal to one and the same 
quantity a, then we say that the random variables Cn converge in 
probability to a. In these terms, relation (2) means that con¬ 

verges in probability to zero. 

When studying single phenomena, we observe them together with 
all their individual peculiarities that obscure the manifestation of 
laws involved in the observation of large numbers of similar phenome¬ 
na. It was noticed a long time ago that factors which are not connected 
with the essence of the process as a whole and which appear only in 
single instances mutually cancel out when one considers the average 
of a large number of observations. 

Later, this empirical result was noted with increasing frequency, 
yet as a rule no attempt was ever made to give any theoretical expla¬ 
nation. Incidentally, for many authors no explanation was required, 
since the presence of regularities both in natural and social phenomena 
was, to them, nothing other than a manifestation of the rules of di 
vine order. 

Even today some authors impoverish the content of the law of 
large numbers and even distort its methodological significance by 
simply reducing it to an experimentally observed regularity. Actual¬ 
ly, the enduring scientific value of the investigations of Chebyshev, 
Markov and other researchers in the field of the law of large numbers 
does not consist in the fact that they detected the empirical stability 
of means, but in the fact that they found the general conditions whose 
fulfillment definitely brings about the statistical stability of means. 

To illustrate the operation of the law of large numbers, we take 
the following schematic example. According to modern physical 
views, a gas consists of an enormous number of individual parti¬ 
cles in constant and chaotic motion. Speaking of each separate mole¬ 
cule, one cannot predict the velocity it will have and the place it 
will be in at any given instant of time. However, we can, given certain 
conditions of the gas, calculate the portion of molecules that will 
be moving with a given velocity or the portion of them that will be 
located in a given volume. But, strictly speaking, that is precisely 
what the physicist wishes to know, since the basic characteristics 
of a gas—pressure, temperature, viscosity, and so forth—are deter¬ 
mined not by the bizarre behaviour of a single molecule but by the 
collective action of all of them. Thus, the pressure of a gas is equal 
to the overall action of molecules impinging on a plate of unit area in 
unit time. The number of impacts and the speeds of the impinging 



198 


Chap. 6. The Law of Large Numbers 


molecules vary according to chance; however, by virtue of the law 
of large numbers (in Chebyshev’s form) the pressure should be nearly 
constant. This “equalizing” effect of the law of large numbers in phy¬ 
sical phenomena is exhibited with exceptional exactitude. Suffice 
it to recall that, say, under ordinary conditions even very precise 
measurements hardly at all permit noticing deviations from Pascal’s 
law of the pressure of a liquid. This extraordinary agreement of theory 
and experiment even served the opponents of the molecular structure 
of matter with a peculiar kind of argument: if matter were molecular 
in structure, then departures from Pascal’s law would be in evidence. 
Such deviations, the so-called fluctuations of pressure, were actually 
observed when scientists had learned to isolate relatively small quan¬ 
tities of molecules, as a result of which the effect of separate mole¬ 
cules did not completely even out and remained rather strong. 

Sec. 32. Cftebyshev's Form of the Law 
of Large Numbers 

We shall now formulate and prove the theorems of Chebyshev, 
Markov and others. The method used in this case belongs to Che¬ 
byshev. 

Chebyshev’s Inequality. For every random variable | having a fi¬ 
nite variance, the inequality 

(1) 

is valid for every e > 0. 

Proof. If F (x) denotes the distribution function of the random 
variable then 

5 dF(x) 

Since in the region of integration 

\x-m \ 

8 

it follows that 

f dF(xx^ J (jj-Mirdfw 

\x-m\>e U-JVi|l>e 

But we only strengthen this inequality by extending the integra¬ 
tion to all values of x: 

f dF (x) < 

|x-M6l>e 

This completes the proof of Chebyshev’s inequality. 


U(x-mirdF(x) = ^ 



Sec. 32. Chebyshev's Form of the Law of Large Numbers 


199 


Chebyshev’s Theorem. If ... are a sequence of 

pairwise independent random variables having finite variances, bound¬ 
ed by one and the same constant 

..., DE„<C, ... 


then^ for any constant e > 0, 


lim P 

n-^ 00 


n 


^ k=i " fe=l 


< s} = 1 


( 2 ) 


Proof. We know that by hypothesis 


and, consequently, 







k= 1 



According to the Chebyshev inequality. 


/i 


P 


< 









/ 




ne^ 


Passing to the limit as n—^-oo, we get 


limP] 


n 




\ 


<1 


>1 


And since probability cannot exceed one, the theorem thus follows. 

We shall take note of certain important special cases of Cheby¬ 
shev’s theorem. 


1. Bernoulli’s Theorem. Let p be the number of occurrences of an 
event A in n independent trials and p the probability of occurrence 
of event A in each of the trials. Then^ for any e > 0, 


lim P —— p 



(3) 


Proof. Indeed, by introducing the random variables equal 
to the number of occurrences of A in the ^th trial, we have 


And since 


P — Pi + Pa + • • • + M-n 



200 Chap, 6. The Law of Large Numbers 

it follows that the Bernoulli theorem is an elementary special case 
of the Chebyshev theorem. 

Since in practical work it is frequently necessary to determine 
unknown probabilities in approximate fashion experimentally, ag¬ 
reement between the Bernoulli theorem and experiment has been 
verified by performing large numbers of experiments. Here, events 
were considered in which the probabilities may, for one reason or 
another, be regarded as known, and concerning which it was easy 
to perform trials and ensure the independence of the trials as well 
as the constancy of the probabilities in each of the trials. All such 
experiments yielded excellent agreement with theory. We indicate 
the results of some of these readily reproducible experiments. 



A deck of 36 cards was divided in half at random 100 times. The 
results are tabulated in Table 11. The first column indicates the num¬ 
ber of the trial, the second, the number of red cards in half the deck, 
the third, the number of cases in which red and black cards came out 
half and half in the trials, and, finally, the fourth column gives the 
frequencies. 

In Example 3, Sec. 5, it wascomputedthattheprobability of obtai¬ 
ning equal numbers of black and red cards in each half deck is 

^ “ 301 ( 91)4 ^ 0-26 

The curve in Fig. 20 gives a clear-cut idea of the variation of the 
frequency ^ as a function of the number of trials. At first, when the 
number of experiments is small, the broken line sometimes departs 





Sec. 32. Chebyshev's Form of the Law of Large Numbers 201 

appreciably from the straight line y--=p^0.2Q. Then, as the number 
of experiments increases, the broken line, on the whole, comes closer 
and closer to the straight line. 

In the case at hand, the result was a rather considerable final (for 
^= 100 ) deviation of frequency from the probability (roughly equal 
to 0.02). By the Laplace theorem, the probability of obtaining such 
a deviation or a greater one is equal to 

P I P ^0.02|- = 

= P i I >0.02 r/- I « 1—2ofo.02 = 

\Ynpq r pq f \ f pg J 

= 1-2® (o.02 ]/ 0 74 ) = 1-20 (0.455)~0.65 

Thus, if the experiment is repeated a large number of times, roughly 
in two thirds of the cases the deviation will not be less than what we 
obtained in our experiment. 

The eighteenth century French naturalist Buffon tossed a coin 
4040 times with heads appearing 2048 times. In Buffon’s experi¬ 
ment, the frequency of heads turning up is approximately equal 
to 0.507. 

The English statistician Karl Pearson tossed a coin 12,000 times 
and obtained heads 6,019 times. The frequency of heads in Pearson's 
experiment is 0.5016. 

On another occasion he threw a coin 24,000 times obtaining heads 
12.012 times, the frequency of occurrence of heads being 0.5005. 
In all these experiments, the frequency only slightly deviated from 
the probability—0.5. 

2. Poisson’s. Theorem. If in a sequence of independent trials, the pro¬ 
bability of occurrence of an event A in the kth trial is equal to 
Pky ihen 

lim p i ili_. Px+p 2 + ■ ■ ■+p„ <; el 1 

where, as usual, [x denotes the number of occurrences of A in the 
first n trials. 

Introducing the random variables which are equal to the 
number of occurrences of A in the ki\i trial, and noting that 

= Pk* Dp* = p^qk < \ 

we find that the Poisson theorem is a special case of the Cheby- 
shev theorem. 



202 


Chap. 6. The Law of Large Numbers 


TABLE 11 


Trial 

No. 

Number 
of red 
cards 

Number 

of 

favourable 

cases 

Frequency 

Trial 

No. 

N umber 
of red 
cards 

Number 

of 

favourable 

cases 

Frequem 

1 

8 

0 

0.00 

51 

9 

13 

0.25 

2 

9 

1 

0.50 

52 

8 

13 

0.25 

3 

11 

1 

0.33 

53 

7 

13 

0.25 

4 

9 

2 

0.50 

54 

9 

14 

0.26 

5 

11 

2 

0.40 

55 

7 

14 

0.26 

6 

8 

2 

0.33 

56 

9 

15 

0.27 

7 

11 

2 

0.29 

57 

9 

16 

0.28 

8 

9 

3 

0.37 

58 

11 

16 

0.28 

9 

8 

3 

0.33 

59 

8 

16 

0.27 

10 

7 

3 

0.30 

60 

8 

16 

0.27 

11 

12 

3 

0.27 

61 

8 

16 

0.26 

12 

10 

3 

0.25 

62 

10 

16 

0.26 

13 

9 

4 

0.31 

63 

12 

16 

0.25 

14 

13 

4 

0.29 

64 

9 

17 

0.27 

15 

12 

4 

0.27 

65 

11 

17 

0.26 

16 

8 

4 

0.25 

66 

12 

17 

0.26 

17 

11 

4 

0.23 

67 

11 

17 

0.26 

18 

10 

4 

0.22 

68 

8 

17 

0.25 

19 

8 

4 

0.21 

69 

10 

17 

0.25 

20 

11 

4 

0.20 

70 

8 

17 

0.25 

21 

12 

4 

0.19 

71 

7 

17 

0.24 

22 

10 

4 

0.18 

72 

9 

18 

0.25 

23 

10 

4 

0.17 

73 

10 

18 

0.25 

24 

9 

5 

0.21 

74 

8 

18 

0.24 

25 

9 

6 

0.24 

75 

11 

18 

0.24 

26 

14 

6 

0.23 

76 

8 

18 

0.24 

27 

9 

7 

0.26 

77 

9 

19 

0.25 

28 

10 

7 

0.25 

78 

9 

20 

0.26 

29 

10 

7 

0.24 

79 

5 

20 

0.26 

30 


7 

0.23 

80 

8 

20 

0.25 

31 


7 

0.22 

81 


20 

0.25 

32 


7 

0.22 

82 


20 

0.24 

33 


7 

0.21 

83 


21 

0.25 

34 


7 

0.21 

84 

6 

21 

0.25 

35 

9 

8 

0.23 

85 

10 

21 

0.25 

36 

9 

9 

0.25 

86 

10 

21 

0.24 

37 

10 

9 

0.24 

87 

9 

22 

0.25 

38 

10 

9 

0.24 

88 

7 

22 

0.25 

39 

8 

9 

0.23 

89 

7 

22 

0.25 

40 

7 

9 

0.22 

90 

10 

22 

0.24 

41 

9 

10 

0.24 

91 

8 

22 

0.24 

42 

10 

10 

0.24 

92 

8 

22 

0.24 

43 

10 

10 

0.23 

93 

10 

22 

0.24 

44 

9 

11 

0.25 

94 

8 

22 

0.23 

45 

8 

11 

0.24 

95 

11 

22 

0.23 

46 


11 

0.24 

96 

9 

23 

0.24 

47 


11 

0.23 

97 

9 

24 

0.25 

48 


12 

0.25 

98 

10 

24 

0.25 

49 

6 

12 

0 25 

99 

7 

24 

0.24 

50 

7 

12 

0*24 

100 

7 

24 

0 24 





















Sec. 32. Chebyshev's Form of the Law of Large Numbers 


203 


3. If a sequence of pairwise independent random variables 
^ 2 , ..., ... is such that 

Mil = Ml, = ... = M|„ = ... = a 
and 

D|i<C. D|2<C, D|„<C, ... 

then for any constant s > 0 

This special case of the Chebyshev theorem serves as a basis for 
the rule of the arithmetic mean that is constantly employed in the 
theory of measurements. Suppose we are measuring a certain phy¬ 
sical quantity a. Repeating the measurement n times under iden¬ 
tical conditions, the observer will obtain results a:i, ..., a:„ 
that do not exactly coincide. The rule is to obtain an approxi¬ 
mate value of a by taking the arithmetic mean of the observa¬ 
tional results: 

a ^ ——— - ^—- 

n 

,If the measurements do not exhibit a systematic error, that is, if 

Mj»fi = MXg = ... = Ma:„ = a 

and if there is no uncertainty about the observed values them¬ 
selves, then according to the law of large numbers, for sufficiently 
large values of n with a probability arbitrarily close to one we 
can in this way obtain a value that is arbitrarily close to the 
desired quantity a. 

The Chebyshev inequality permits obtaining a stronger result 
in the case of identically distributed independent terms. 

Khinchin^s Theorem. If the random variables li, Ig, ... are in¬ 
dependent and identically distributed with finite expectations 
(a = M|„), then as n—*oo 

n \ 

7S?*—“ <8[ —1 

" t=i ; 

Proof. We take advantage of a device which was first employed 
by A. A. Markov in 1907 and later became known as the method 
of truncation. This procedure is frequently used in modern proba¬ 
bility theory. 






204 


Chap. 6. The Law of Large Numbers 


Define the new random variables by the following rule: let 
6 > 0 be fixed and for ^=1, 2, n 

’l* = i 4 . ?* = 0, if |l*|<6n 
’1* = 0, = if 

It is obvious that for any k (\ ^k^n) 

+ Ca- 

For the variables % there exist expectation and variance 

bn 

= J xdF(x) 

-bn 

bn bn 

Dri^= ^ x^dF{x) — a%^6n J \x\dF {x) ^6bn 
— — 6^ 

00 

where J \x\dF{x). Since as n—^oo 

— OD 

it follows that for any e > 0, given sufficiently large n, 


a„—a\< E 


By virtue of Chebyshev’s inequality 



Now using inequality (4) we get: 


( 4 ) 


n 


t'L ’I*- 


a 


k=i 




Now note that 

P{C„=^0}= j I \x\dF(x) 


jc 1 > 6/1 


X! > 6/1 


Since expectation exists, the right-hand side becomes less than ~ 
for sufficiently large n. But 


/ n 


P{S*^0}<6 

U=i / k=\ 



205 


Sec. 32. Chebyshev's Form of the Law of Large Numbers 


and so 



Since e and 6 are arbitrary, the right-hand side may be made less 
than any number; this proves the theorem. 

We also formulate Markov’s theorem; its proof is an obvious 
consequence of Chebyshev’s inequality. 

Markov’s Theorem. If a sequence of random variables li, .., 

..., ... is such that as n —► ©© 

,5, 

then, for any positive constant e, 

jEmIs <e| = l 

" k-i " *_1 J 

If the random variables li, ••• are pairwise inde¬ 

pendent, the Markov condition becomes (as n—^oo): 

n 

—0 

" *=i 

From this it is evident that the Chebyshev theorem is a special 
case of Markov’s theorem. 

We obtain the following theorem as a direct consequence of 
Markov’s theorem. It was also proved by Markov. 

Theorem. Let \x, be the number of occurrences of an event E in n 
trials connected in a homogeneous Markov chain, and let p^, p^, ... 
be the probabilities of occurrence of E in the first, second, and so 
forth trials, respectively; then, for any e > 0, 

' n 

liraP- <8 =1 (6) 

The proof of the theorem is obvious by virtue of the results of 
Example 6 in Sec, 28. 



206 


Chap. 6. The Law of Large Numbers 


Since according to the results of this example 

n 

^'E^Pi,=p+o{i) 

k=i 

it follows that (6) is equivalent to the equatior 



In this form, the foregoing theorem is completely analogous to 
Bernoulli’s theorem. 


Sec. 33. A Necessary and Sufficient Condition 
for the Law of Large Numbers 


We have already pointed out that the law of large numbers is 
one of the basic propositions of probability theory. This makes it 
clear why so much effort has gone into establishing the broadest 
possible conditions to be satisfied by the variables I 2 ,..., • • •» 

so that the law of large numbers should hold. 

The history of the problem is as follows. At the end of the 17th 
century and the beginning of the 18th, James Bernoulli proved a 
theorem that bears his name. This theorem of Bernoulli was first pub¬ 
lished in 1713, after the author’s death, in the treatise Ars conjectandi 
(the art of constructing conjectures). Then at the beginning of the 
19th century, Poisson proved a similar theorem under more general 
conditions. No further advances were made up to the middle of the 
19th century. In 1866 the great Russian mathematician P. L. Che- 
byshev discovered a method that we have given in Sec. 32. Later, 
A. A. Markov noticed that Chebyshev’s reasoning permits of a still 
more general result (see Sec. 32). 

Further efforts to attain fundamental advances failed until 1926, 
when A. N. Kolmogorov obtained conditions necessary and suffi¬ 
cient for a sequence of mutually independent random variables h, 
12, ..., in» • • • to obey the law of large numbers. In 1928, A. Ya. 
Khinchin demonstrated that if the random variables are not only 
independent but are also identically distributed, then the existence 
of the expectation is a sufficient condition for applying the law 
of large numbers. 

During recent years many papers have been devoted to determi¬ 
ning the conditions to be imposed on dependent variables so that 
the law of large numbers may be applicable. The Markov theorem 
is a proposition of this type. 



Sec. 33. A Necessary and Sufficient Condition for the Law of Large Numbers 207 

By employing the Chebyshev method it is easy to obtain a condi¬ 
tion similar to Markov’s condition, but this time not only sufficient 
but also necessary for the applicability of the law of large numbers 
to a sequence of arbitrary random variables. 

Theorem. So that a sequence of random variables 

^2* ^3» * • * 


(arbitrarily dependent) should, for any positive z, satisfy the relation 

\ 


limP 


«->oc 


<e[ = l 

fe=l 


( 1 ) 


it is necessary and sufficient that as n —► oo 

n \ 2 

M _ L _ 

f n \ 2 

«“+ S (ik-m) 

\k=\ 


-0 


( 2 ) 


Proof. First suppose that (2) is satisfied; we will show that in 
this case (1) will also be satisfied. Denote by 0 „(a;) the distribu¬ 
tion function of the variable 

n 

It is easy to verify the following chain of relations 

ipu'-m, 


i * I > 8 


\ 

^ e 

II 

a 

II 

CO 

A 

■ 

l + e2 

d(S>„{xX, ea 



1 

x\>e 


^I+e” 
^ 82 

1 + M 


1+ri^ 

This inequality proves that the conditions of the theorem are suf¬ 
ficient. 


* We can write this equality on the basis of the formula 

nf(l)='}jlW‘‘W 


(see Theorem 1, Sec. 27). 



208 


Chap. 6. The Law of Large Numbers 


Let us now show that condition (2) is necessary. It will readily 
be seen that 


\x\> e \x\> z 

\x\< z 

> f Tzr~J W — == M — — - Y — 

J 1+^' ” ^ 1+Tl^ 


Thus, 


0<M-^<e^+P{hJ>8} 

l+% 


First choose e sufficiently small and then n sufficiently large; by 
doing so we can make the right-hand side of the last inequality as 
small as desired. 

We note that all the theorems that were proved in the preceding 
section follow readily from the general proposition that has just 
been proved. Indeed, since for any n and any the inequality 


^ H-'On 




L k= i 


2 


is valid, it follows that if variances exist we get the following 
inequality: 


M 



l-\-y\n 



Thus, if the Markov condition is satisfied, then condition (2) is 
also satisfied and, consequently, the sequence ^i, I 2 , • •., L* • - • obeys 
the law of large numbers. 

Still, we must take note of the fact that in more complicated cases 
when the variables are not assumed to have finite variances, the 
theorem just proved is of extremely slight use in any real verifica¬ 
tion of the applicability of the law of large numbers, for condition 
(2) refers not to the separate summands but to their sums. However, 
it is apparently impossible to hope to find necessary and sufficient 
conditions (and what is more, such as to be convenient for applica¬ 
tions) without making any assumption about the variables and the 
relationship existing between them. 

Attempts at a practical employment of the theorems that have just 
been proved come up against one fundamental difficulty: can we 



Sec. 34. The Strong Law of Large Numbers 


209 


take it that the phenomenon or the production process under study 
proceeds via the action of independent causes? Is there not a contra¬ 
diction between the very concept of independence and our basic ideas 
of the interrelationship of phenomena in the external world? In a 
mathematical study of the phenomena of nature, technical processes 
or social phenomena we must first of all derive our basic premises on 
the basis of a profound study of the essence of the phenomenon at hand 
and its qualitative peculiarities. We have to take into account chan¬ 
ges in the external conditions in which our phenomenon develops 
and alter the mathematical apparatus and the premises underlying 
its applications as soon as it is seen that the conditions of realization 
of the phenomenon have changed. 

As a first approximation to reality we can assume that the causes 
operating on the phenomenon are independent, and we can draw con¬ 
clusions from this supposition. We can judge about how successful 
our scheme of the phenomenon is and how suitable the mathematical 
apparatus for its study is by agreement between the theory we have 
constructed and practice. If our theoretical results depart substanti¬ 
ally from experiment, then we will have to revise the premises; in 
particular, if it is a question of applying the law of large numbers, 
we may have to give up the supposition of total independence of the 
operating causes and presume them to be dependent, albeit of a weak 
nature. 

We have already spoken of the fact that accumulated experience 
in the use of the theorems of the law of large numbers indicates that 
the condition of independence is satisfactory in many important 
problems of natural science and technology. 


Sec. 34. The Strong Law of Large Numbers 

It often happens that the definitely unjustified conclusion is drawn 
from Bernoulli’s theorem that the frequency of an event A tends to 
the probability of A in the case of a limitless increase in the number 
of trials. Actually, Bernoulli’s theorem states that for a sufficiently 
large number of trials n the probability of one single inequality 



becomes greater than 1—q for an arbitrary q >0. In 1909 the French 
mathematician E. Bore! detected a more profound proposition which 
became known as the strong law of large numbers. The formulation 
and proof of the Borel theorem and also of the more general proposi¬ 
tions of Kolmogorov requires the introduction of an important concept: 
the convergence of a sequence of random variables. 



210 Chap. 6. The Law of Large Numbers 

Let there be a sequence of random variables defined on one and the 
same set of elementary events V: 

lnHn{e){e^U) ( 1 ) 

We consider the set A of all the elementary events e for which the 
sequence fn (e) converges. Let f{e) denote the limit of (e) at the po¬ 
int e. If we let A^nh denote the set of those e for which the inequality 

\L*Ae)-f{e)\<y ( 2 ) 

is fulfilled, then it is obvious that 

OP 00 00 

>i=n 2 (3) 

rz=l n=l k=\ 

Indeed, if the sequence of functions fn{e) converges at the point e, 
then: (1) inequalities (2) must be satisfied for all k \i n is sufficiently 
great; (2) they must be satisfied beginning with a certain n\ (3) they 
must be satisfied for sufficiently large n for any value of r. Equation 
(3) states these three requirements symbolically. In accordance with 
Equation (3) and the definition of a random event, the subset A be¬ 
longs to the field of random events. We define a random variable i 

as follows: if e^A, then \=f(e), but if e^A, then ^=0. 

If the probability of a random event A is equal to 1, then we say that 
the sequence of random variables converges to the random variable | 
almost certainly (or that it converges with probability one*). 

If the sequence converges to | almost certainly, we then write 
this fact as follows: 

P{|„ —1} = 1 (4) 

This equation can clearly be written differently: 

P{i„y^|}=0 (4') 

The latter expression signifies that the probability that there will 
be a number r such that for all n and at least for one value of 


* The concept of almost certain convergence corresponds exactly to that of 
convergence almost everywhere in the theory of functions. 

In probability theory a big role is also played by the so-called convergence 
in probability: a sequence of random variables converges in probability to the 
random variable | if, for any e > 0, the probability of the inequality 111 < e 
tends to unity as n —► oo. 

Convergence in probability is an analogue of the convergence in measure of 
a sequence of functions in the theory of functions. 

It is obvious that the law of large numbers asserts that under certain circum- 

I 

stances the sums — ^ (1;^—converge in probability to zero. 



Sec. 34. The Strong Law of Large Numbers 


2ii 


k — k(n) the inequality 
holds is equal to zero. 

We now indicate a sufficient condition for the convergence of 
a sequence of random variables with probability one. 

Lemma. If for any positive integer r, 

00 

SP{l5»-S|>|}< + ~ (5) 

n=l ' ’ 


then (4) or—what is the same thing —(4') holds. 

Proof. Let En denote an event which consists in the fact that 
the inequality 



is valid. 

We further assume 

*=1 

From the fact that 

00 00 

P{S!.}< ±P {£!.,,}= ± 

ft=l Un+l ' ' 


we derive, by virtue of (5), the equation 

lim P {5;;} = 0 

n-*v> 

Now let 


S r _ 

““0][02w3 4 • • 


( 6 ) 


From the fact that the event 5^ implies any one of the events 
we get, by virtue of (6), 

P(50 = 0 (7) 


Finally, set 


5 = + + ... 


It is easy to establish that this event signifies that any r will be 
found such that for every /t(n=l, 2, 3, ...) the inequalities 



212 


Chap. 6. The Law of Large Numbers 


will be satisfied for at least one k[k — k(n)\. Since 

p (SX S p {S^} 

r=\ 

it follows that by (7) 

P{S}=0 

and this completes the proof. 

Repeating word for word the foregoing reasoning, we can obtain 
a somewhat stronger proposition: 

If there exists a sequence of integers 1 = < /Zg < Wg <... such 

that the series 


2ip{ max 


converges for every positive integer r, then the sequence of random 
variables ^ 2 > • • • Q-^f^ost certainly converges to 1. 

Let us now apply the newly introduced concept and the lemma 
that we have proved. 


BorePs Theorem. Let p, be the number of occurrences of an event 
A in n independent trials, in each of which A may occur with 
probability p. Then, as n —>- 00 , 



= 1 


Proof. According to Lemma 1, it suffices to detect convergence 
of the series 



( 8 ) 


for any natural number r. For this purpose we note that in the 
very same way that Chebyshev’s inequality was proved (Sec. 32) 
it is possible to establish the following inequality: for every random 
variable for which M(|—M^)* exists, 


Thus, 


P {|l-Mi \>b} 



As we have repeatedly done, let us introduce the auxiliary random 
variables \ii, equal to the number of occurrences of event A in the 



Sec. 34. The Strong Law of Large Numbers 


213 


ith trial. Since 


t=i 


it follows that 


n n n n 


—PXfi*—P)(^—P) (9) 

^ ^ i=i /=i /j = n= 1 


Since M(ii/—p) —0, all the terms having at least one of the 
factors (Pj-— p) to the first power vanish. Therefore, in this sum 
only terms of the form M(p;--p)* and M(iJi/— pY{ii^—pf are 
different from zero. It is clear that 

and 

M(n,——/’)* = (<=^S) 

The number of summands of the first type is n, the number of 
the second type is Sn(n—l). Indeed, i may coincide either with j 
or with k or with I and then take on one of the n values from 
1 to n; s can assume only one of the n —1 values because s^i. 
Thus, 

m (-^—= [n{p^ + q^) + 3pq{n^—n)]<-^ 


and, consequently, the series (8) converges. The proof of the theorem 
is complete. 

Borei’s theorem sparked off a whole cycle of investigations devoted 
to seeking the conditions under which the so-called strong law of 
large numbers holds. 

We say that a sequence of random variables 


^iJ ^2» ^3* • * * 

obeys the strong law of large numbers if with probability one, as 
n -> cx>, 

n n 


k~\ k^\ 


Broad yet simple sufficient conditions for accomplishment of the 
strong law of large numbers are given by a theorem of A. N. Kol¬ 
mogorov whose proof is based on an interesting generalization of 
Chebyshev’s inequality. 

Kolmogorov’s Inequality. If the mutually independent random 
variables li, la, ..., have finite variances, then the probability of 



214 


Chap. 6. The Law of Large Numbers 


the simultaneous realization of the inequalities 


2 (is-ms) 

s= 1 


<8 {k—\, 2, .. n) 


is not less than 


1 — 


JL 

6^ 


b= 1 


Proof. We introduce the notations 

k 

^k — ^k' — 2 ^ 1 / 

/=i 

Also, let Ef^ denote the event that 

|Sy|<E for j^k —1 and 15^1^8 (10) 

Eq denotes the event that |5y|<8 for /^n. 

Since the event that consists in the fact that at least for one 
k {\ ^k^n) the inequality 


|SJ>8 (^=1, 2. n) 

will hold (in other words, that max|S;^|^8) is equivalent to the 


n 

event 2 it follows by virtue of the incompatibility of the 

k=\ 

events Ef^ that 

n 

P / max 1I ^ 8 \ = 2 P 



According to (5) of Sec. 26, 

DS„= 2 P(£,).M(S^£*)> 2 P(£*).M (£“/£*) 

fe=0 k=\ 


Clearly, then, 

M (£“/£*) = M (S| + 2 2 S,iv+ 2 ^1/ + 2 2 ’1/%/'^*) > 

I i>k j>k i>h>k / 

/S| + 2 2 S*r,/ + 2.2 

' I>k i>h> k » 

Since the occurrence of the event imposes a restriction solely 
on the values of the first ^of the variables I,-, while the subsequent 
variables remain (given this condition) independent of one another 
and of it follows that 

M (S* V£*) = M {£*/£*) • M (yE,) = 0 



Sec. 34, The Strong Law of Large Numbers 


215 


and 

{h^j, h>k, j>k^\) 
Also, in accordance with (10), the inequality 


(k^l) 


holds. 

We can therefore write 


k=l 

Whence 


n 


E 

k=] 


= P/ max 

11 < * < /I 






This completes the proof of Kolmogorov’s inequality. 

Kolmogorov’s Theorem. // a sequence of mutually independent' 
random variables ^ 3 , ... satisfies the condition 

V ^ _L 

2-1 + 

n=\ 

then it obeys the strong law of large numbers. 

Proof. We set 

n 

fe = l 

Consider the probability 


P^ = P{max|u„|< 8 , 


Since 


P^<P{max|S„|<2'”8, 2“</i<2“+i} 
we have, by Kolmogorov’s inequality. 




1 


^ ( 2 '« 8)2 . 


E D! 


^•<2OT+i 


In accord with the remark made with respect to the lemma of 
this section, to prove the theorem it is sufficient for us to find 
that the following series converges: 

flo 

s 

ms I 



216 


Chap. 6. The Law of Large Numbers 


But according to the foregoing, 

00 CO 


as 


m= 1 


m=l ^ / < 2»»+i 


S D5, = ±SDi,E ,.2 


- 2 m 


/ = 1 


where the sum 2/ extended over those values of m for which 

2 W +1 ^ 

We determine the number p by means of the inequalities 


Then 




y 2 -“=E 2 


Thus, 


m =p 


4 16 

- 2 m __ 9-2?^_L!i_ 

~ 3 ^ 


00 

jLd ^m^ 3e2 ^ ; 

m=l /=1 ' 


.•3 


The proof of the theorem is consequently complete. 

The theorem just proved clearly contains the following result: 

Corollary. If the variances of the random variables are bounded 
by one and the same constant C, the sequence of mutually independent 
random variables |i, I 2 , is, • • • obeys the strong law of large numbers. 

We thus see that the strong law of large numbers holds not only 
for the Bernoulli scheme with constant probability of occurrence of 
an event A in each of the trials (Borel’s theorem) but also in the case 
of the Poisson scheme (the probability of event A depends on the num¬ 
ber of the trial). 

The theorem just proved enables us to obtain as a corollary one final 
result, which was also found by A. N. Kolmogorov. 


Theorem. The existence of expectation is a necessary and sufficient 
condition for applying the strong law of large^ numbers to a sequence 
of identically distributed and mutually independent random variables. 

Proof. From the existence of expectation follows the finiteness of 

the integral J \x\dF{x), where F (x) is the distribution function of 

the random variables i„. Therefore 

2p{|II>«} = I] 2P{A<|i|<A+i} = 

n= 1 n=! /f > rt 

Ijssl k = Q " C 1 •* 1 *+ * 

< JI ;c I df (;c) < 00 


(11) 



Sec. 34. The S trong Law of Large Numbers 


217 


We introduce the random variables 


We then obtain 


In for |i„|<n 
0 for 1I > n 


+ n 


n 




— n 


and 


CO 


« = 1 n=l fe =0 


oe n 


(^ + 1)2 


eo 


00 


<E p{^< iii<A+i}(A+i)' E 


ife =0 


n > A 


n 2 


Since 


ec 


y ™<—+-< 

^ W2 ^ -r ^ 


we find, by (11), 


n> Ar 


2 _ 


00 


V 

n 2 ^ 


oo 


n=\ 


That is, satisfies the strong law of large numbers. 

It remains to show that this proves the theorem. To do this, it 
will obviously be sufficient to detect that the probability of at 
least one inequality 

for n'^N tends to zero as N~^oo. Indeed, 


P{ln¥=rn for any n^N}^ 2 P{L=7^i«} = 

N 

= S P{ll„l>nK 2 (n-iV+I)P{n<||„|<«+I}< 

n'!^ N n^N 

S dF{x)^'^ I \x\dF{x)= J \x\dFix) 

n=N /i<Ul</i+l «< |xl</i+l \x\>N 

By hypothesis, the right-hand side of this inequality may be made 
smaller than any preassigned number by choosing N sufficiently large. 

The fundamental role of the strong law of large numbers in the theory 
of probability and in its applications is exceedingly great. Indeed, 
suppose for a moment that, say, in the case of identically distributed 
summands having a finite expectation, the strong law does not hold. 
Then we can assert with probability arbitrarily close to unity that 
instances will recur where the arithmetic mean of the observational 



218 


Chap. 6. The Law of Large Numbers 


results will be far removed from the expectation. And this would 
happen even in cases when the observations are performed without 
any systematic error and with complete definiteness. Would it then 
be possible to consider that the arithmetic mean of the observational 
results is close to the quantity being measured? And could we take it 
that under these conditions the arithmetic mean might be conside¬ 
red as an approximate value of the quantity being measured? This 
is doubtful. 


EXERCISES 


1. Prove that if the random variable 
constant), then 




is such that Me®- exists (a > 0 is 




a 


2. Let f{x) > 0 
exists, then 


be a nondecreasing function. Prove that if M(/(l|—M||) 


P{U-MU^e}. 


M/ 1) 

/(e) 


3. A sequence of independent and identically distributed random variables 
•[I,*} is defined by the equalities 

(a) p/|„ = 2‘-‘°8*-2‘“S>»8n = -L (*= 1 , 2, 3, ...) 

’ ' 2 * 


(b)P{|„ = A} 


C 

log2 k 



Prove that the law of large numbers is applicable to both sequences. 

4. Prove that the law of large numbers may be applied to a sequence of 
independent random variables {Iny such that 

P{la = «’} = P{Sn=-«‘‘}=4- 

if and only if a < 

5. Prove that if the independent random variables | 2 » •••. im ^re 
such that 


max 

1 < ft < rt 


i x \>A 


\x\dFk (X) 


0 when A 


00 


then the law of large numbers is applicable to the sequence 

Hint. Employ the method used in the proof of Khinchin’s theorem. 

6, Using the result of the preceding problem, prove that if for a sequence of 
independent random variables there exist numbers a > 1 and ^ such that 
then the law of large numbers is applicable to the sequence {1^,} 
(Marlwv's theorem). 

7 Given a sequence of random variables la.for which and 

as \i — i \—>■ 00 (i?,y is the correlation coefficient of |/ and |y), prove 
that the law of large numbers is applicable to this sequence (Bernstein's theorem). 



CHAPTER 



Characteristic Functions 


We have seen in the preceding chapters that probability theory 
makes wide use of the methods and the analytical apparatus of va¬ 
rious divisions of mathematical analysis. A simple solution of an 
extremely wide range of problems of probability theory, especially 
those associated with the summation of independent random variables, 
is obtainable by means of characteristic functions^ the theory of which 
has been developed in analysis and is known by the name of Fourier 
transformations. This chapter deals with the basic properties of cha¬ 
racteristic fhnctions. 

Sec. 35. Definition and Elementary Properties 
of Characteristic Functions 

The characteristic function of a random variable I is defined as 
the expectation of the random variable If F (x) is the distribu¬ 
tion function of the variable |, the characteristic function is, by the 
theorem of Sec. 27, 

f(t)^\e>'=‘dF(x) (1) 

We agree, henceforth, to denote the characteristic function and the 
corresponding distribution function by the same letters, lower-case 
for the former and upper-case for the latter. 

From the fact that for all real t there follows the existence 

of the integral (1) for all distribution functions; hence, a characte¬ 
ristic function may be defined for every random variable. 


* The letter t stands for a real parameter. The expectation of the complex 
random variables l+tTj is defined as M|-j-tMT|. It is easy to verify that Theo¬ 
rems 1, 2, and 3 of Sec. 28 are valid in this case as well, 



220 


Chap. 7, Characteristic Functions 


Theorem 1. A characteristic function is uniformly continuous, over 
the whole line and satisfies the following relations: 

/(0 )-l, (-oo<^<oo) (2) 

Proof. The relations (2) follow immediately from the definition 
of a characteristic function. Indeed, by (1) 

f{0)=\\-dF(x) = l 
and 

I / (01 = I J df (x) I < 5 I e“^ \ dF (x) = J df (x) = 1 

It now remains to prove the uniform continuity of the function 
f(t). For this purpose let us consider the difference 

+ = l (e'**— 1) dF (x) 

and let us estimate its absolute value. We have 

Let 8>0 be arbitrary; we choose A sufficiently large so that 

J dF(x)<-|- 

]x\> A 

and select h so small that for \x\< A 
Then 

A 

\f{t + h)-f{t)\^ ^\e‘^-\\dF(x) + 2 I dF{x)^t 

-A \x\>A 

This inequality proves the theorem. 

Theorem 2. If = where a and b are constants, then 

4 (t)=h 

where (^) and f^ (t) denote the characteristic functions of the var¬ 
iables T| arui 
Proof. Indeed, 

f^ {t) = = e^^% (at) 

Theorem 3. The characteristic function of the sum of two inde¬ 
pendent random variables is equal to the product of their charac¬ 
teristic functions. 



Sec. ^5. Definition and Properties of Characteristic Functions 


221 


Proof. Let I and t] be independent random variables and let 
W=H-'n- Then, clearly, and e will also be random variables 
along with | and r^. From this it follows that 

^ jvigi'f (S+T)) == 


This proves the theorem. 

Corollary. If 

l = Ei + l.+ ...+L 

and each term is independent of the sum of the preceding terms, then 
the characteristic function of the variable | is equal to the product of 
the characteristic functions of the summands. 

The application of characteristic functions rests to a large extent 
on the property formulated in Theorem 3. As we saw in Sec. 24, the 
addition of independent random variables leads to an extremely com¬ 
plicated operation, the convolution of the distribution functions of 
the summands. With regard to characteristic functions, this complex 
operation is replaced by an extremely simple one, the simple multi¬ 
plication of characteristic functions. 

Theorem 4. If a random variable | has an absolute moment of the 
nth order, then the characteristic function of the variable I is differenti¬ 
able n times and when k^n 

(0) = mi* (3) 


Proof. Indeed, formal differentiation of the characteristic func¬ 
tion k times (k^n) leads to the equation 


But 


/<*> (t) = i* J x^e^^* dF (x) 

J x^e^^^ dF (x) I ^ J I a: I* dF (x) 


(4) 


and, consequently, by hypothesis of the theorem, it is bounded. 
It follows from this that the integral (4) exists ^nd differentiation 
is legitimate. Putting 0 in (4) we find 

fM ( 0 ) = J x" dF (x) 


Expectation and variance are very simply expressed by means 
of derivatives of the logarithm of the characteristic function. In¬ 
deed, put 


Then 





222 


Chap. 7. Characteristic Functions 


and 

^ (t) = . 727 ^- 

Taking into account Equation (3) and that /(0) = 1, we find 

^'(0) = r(0) = fM^ 

and 

xp" (0 )=r (0) - [/' (0)] 2 = ~ [iMy 2 =- D| 

Whence 

MS = f f (0) 

Di = -^"(0) 

The ^th derivative of the logarithm of the characteristic function 
at the point 0 , multiplied by i*, is called the cumulant (semi-inva¬ 
riant) of the kth order of the random variable. 

As follows directly from Theorem 3, when independent random 
variables are added their cumulants are added too. 

We have just seen that the first two cumulants are the expectation 
and the variance, that is, the first-order moment and a certain ra¬ 
tional function of the moments of first and second orders. It will be 
readily seen, by means of computation, that the cumulant of any 
order k is an entire rational function of the first k moments. By way 
of illustration, we give the explicit expressions of cumulants of the 
third and fourth orders: 

(0) = _ {ME^—3ME2 -M^ + 2 [Ml] 

(0)=M|^ —4M|»M|—3 [M|2]^+ 12 M |2 [MI ]^—6 [M|]^ 

We now consider a few examples of characteristic functions. 



Example 1. A random variable | is distributed in accordance 
with the normal law with expectation a and variance cr^. The cha¬ 
racteristic function of the variable | is 


(p (0 


0 V 2 




iitx 2(j* dX 


By the substitution 


z 



(p(t) is reduced to the form 





£L 
* dz 


<p ( 0 =« 



Sec, 35. Definition and Properties of Characteristic Functions 


223 


It is known that for any real a 


hence, 


oc — ta 2* 

J e ^ dz~\^2n 




iat — 


2 


Using Theorem 4 we can readily compute the central moments 
for a normal distribution and in this alternative way obtain the 
result of the example considered in Sec. 30. 

Example 2. Find the characteristic function of a random vari¬ 
able I that is Poisson distributed. 

By hypothesis, the variable I assumes only integral values, and 

= = ^ (* = 0 , 1 , 2 ,...) 

where A, > 0 is a constant. 

The characteristic function of the variable I is 

* * * 

f it) = P F 

ft=0 fesO 

k\ 

fe =0 

According to (5) we then find 

M| = fi|)'(0) = X, D| = -t''(0) = X 

The first of these equations was earlier obtained by us directly 
(see Example 3, Sec. 26). 

Example 3. A random variable | is uniformly distributed over 
the interval (—a, a). The characteristic function is equal to 


Example 4. Find the characteristic function of the variable p, 
which is equal to the number of occurrences of an event A In n 
independent trials, in each of which the probability that A will 
occur \s p. 

The variable p may be represented in the form of a sum: 

= H-i +1^2 “F • • * + M'S 



224 


Chap. 7. Characteristic Function 


of n independent variables, each of which takes on only two values, 
0 and 1, with respective probabilities q—\—p and p. The vari¬ 
able assumes the value I if event A occurs in the ^th trial 

and the value 0 if event A does not occur in the kih trial. 

The characteristic function of js equal to 

fj^ (t) = == -{- e^^’^p = <7 + 

According to Theorem 3, the characteristic function of the variable 
P' is 

/«)=n/*(o=(<7+p«'r 

k=l 

Let us also find the characteristic function of the variable 

= By Theorem 2 it is 

K __ _ 

Example 5. Characteristic functions satisfy the equation 

fi-t)==nt). 

Indeed, 

f (— 0 = J i i^) = / (0 


Sec. 36. The Inversion Formula and the Uniqueness Theorem 

We have seen that from the distribution function of a random 
variable ^ it is always possible to find its characteristic function. 
For us it is important that the converse proposition holds as well: 
a distribution function is uniquely determined by its characteristic 
function. 


Theorem 1. Let f (t) and F (x) be the characteristic function and 
the distribution function^ respectively^ of a random variable |. If 
and x^ are points of continuity of the function F(x), then 


F(x,)—F(Xi) = 



( 1 ) 


Proof. From the definition of a characteristic function it follows 
that the integral 


c 



Q—itxi _ 

_ _ _ 


f (t) dt 


it 



Sec. 36. The Inversion Formula and the Uniqueness Theorem 


is equal to 


c 

± f J -1 [ef t (z)dl 


— C 


The order of integration may be changed in the last integral, since 
the integral converges absolutely with respect to 2, and the limits 
of integration are finite with respect to t. Thus 


-ij j 




it 


dt 


—c 

r c 


( 2 )- 


Lo 


X|) ft ™ 


it 


dt 


dF (z) 


<K> C 

_L J J j^sin / ( 0 —JTi) sini ?{2 — 

— OB 0 


t 


dt dF (z) 


From analysis it is a well-known fact that as c— ♦go 

1 


^ , 1 if a > 0 

sina/ ,, 1 2 




ji j t 

0 


dt 


Y if a<0 


( 2 ) 


and this convergence is uniform with respect to a in each region 
a > 6 > 0 (or a < —6) and when |a|^6, for all c, 


1 P sin at 




t 


dt 


< 1 


( 3 ) 


Assume for the sake of definiteness that and represent 

the integral as the following sum: 

JK, —6 Xi + 6 JCg —6 Xt+6 00 

Jc- ^ i S ’ 4 ’(^» i^^^F {z) 

— OP 6 Xf—6 «g+fl 

where for brevity we have used the notation 


and 6> 0 is chosen so that a:i + 6<a:2—6. 

The inequalities 2 — <—6 and 2— x^<. —6 hold in the region 

— 00 < 2 < ;Ci — 6. We therefore conclude, on the basis of ( 2 ), that 



226 


Chap. 7 . Characteristic Functions 


as C—► oo 

^ Zy dF {z) >-0 

— QO 

Similarly, when Xg + fi < z < + oo and when c —► oo, 

GD 

J 'il){c, z; x^, x^)dF{z)—^0 

**+6 

Further, since in the region a:i + 6 <z<a: 2—6 the inequalities 

2 —a:i> 6 and z—X 2<6 are valid, it follows from ( 2 ) that as 
c —»> oo, 

Xg -6 * 8-6 

5 tj)(c, z; X,, x^)dF{z)—* 5 dF {z) = F {x^—6)—F {x,-\-6) 

* 1+6 * 1+6 

Finally, by (3) we can take advantage of the estimates 
* 1+6 * 1+6 

[ q)(c, z; a:i, x^)dF{z) <2 J dFiz) = 2[F {x,+6)—F {x^—b)] 

*,-6 * 1-6 

and 

* 1+6 *1 + 6 

5 z; X,, x^)dF{z) <2 J dF {z) = 2[F {x^+ b)—F {x^—6)] 

*,-6 * 2—6 

We thus find that for every 6 > 0 

fim J^ = F {x^—6)—F {Xj + 6) + Riib, x^y x^) 

C -► OP 

and 

lim J^^F {x^—b)—F {Xi + 6)+ R^(6y x^y x^) 

c -» 00 

where 

\Ri{6y Xiy x^)\<2{Fix,+6)—F{Xi—6) + F{x^-\-6)—F{x^—6)\ 

{i=ly 2) 

Now let 6 — 0. From the fact that x^ and x^ are points of con¬ 
tinuity of the function F {x) there follow the equations 

lim F {Xj + 6) = lim F {x^^ —6) — F (Xj) 

6-0 6-0 

lim F + 6) = lim F {x^—6) = F {x^} 

6—0 6—0 


and 



Sec. 36. The Inversion Formula and the Uniqueness Theorem, 


227 


And since does not depend on 6 , it follows that 

lim J, = F(x^)—F(x,) 

C -*■ CO 

Equation (1) is called the inversion formula. We shall use it to 
derive the following important proposition (the uniqueness theorem). 

Theorem 2 . A distribution function is uniquely determined by its 
characteristic function. 

Proof. Indeed, it follows directly from Theorem 1 that the fol¬ 
lowing formula holds at each point of continuity of the function/^ (a:): 

+ c 

1 ,, .. p p-ity _ p-ttx 

= •‘m lim I- j - f(t)dt 

— c 

where the limit in y is evaluated with respect to any set of points 
y that are points of continuity of the function F{x). 

As an application of the last theorem we will prove the following 
propositions. 

Example 1. If the independent random variables and are 
normally distributed, then their sum ^ = ii + i 2 is also distributed 
normally. 

Indeed, if 

D^i = af; fAl^ = a^, Dl^ = ol 

then the characteristic functions of the variables and are 


flit) 




fAt) = e 


ia.t —T- 


By Theorem 3, Sec. 34, the characteristic function f{t) of the sum 
is equal to 




It (Oi + ai) —(a^ +a|) 


This is the characteristic function of a normal law with expecta¬ 
tion a = + and variance = On the basis of the 

uniqueness theorem we conclude that the distribution function of 
the variable ^ is normal. 

The converse proposition, due to H. Cramer, that we formulated 
in Sec. 24 may be stated as follows in terms of characteristic 
functions: if fi(t) and f^{t) are characteristic functions and 




£1 

2 



228 


Chap. 7. Characteristic Functions 


then 






hit) = e 


- iat- 


(1 -a*) 


(0<a<l) 


Example 2. The independent random variables h 
distributed according to the Poisson law, and 




k\ ’ 


= n = 


k\ 


and ia are 


Prove that the random variable ^ = ^i + l 2 is distributed in accor¬ 
dance with the Poisson law with parameter ^ = ^^+^ 2 . 

Indeed, in Example 2 of the preceding section we found that 
the characteristic functions of the random variables and ^2 ^^e 


By Theorem 3 of the preceding section, the characteristic function 
of the sum ^ = ii + E 2 


that is, it is a characteristic function of some Poisson law. By the 
uniqueness theorem, the only distribution with f{t) as its charac¬ 
teristic function is the Poisson law for which 

= = (fe>0) 

D. A. Raikov proved the more profound converse proposition: 
if the sum of two independent random variables is distributed accor¬ 
ding to the Poisson law, then each summand is also distributed in 
accordance with the Poisson low. 

Example 3. A characteristic function is real when and only when 
the corresponding distribution function is symmetric, that is, when 
the distribution function satisfies the equation 


F(;c)=l—E(—x + O) 

for all X. 

If the distribution function is symmetric, then its characteristic 
function is real. If | has a symmetric distribution function, then 
both I and—| are identically distributed. Hence the equation 

f (t) =Me'« = = f{—t) = JJt) 

holds and this means that f{t) is real. 

To prove the converse proposition, consider the random variable 
q= —|. The distribution function of the variable r\ is 



Sec. 36. The Inversion Formula and the Uniqueness Theorem 


229 


The characteristic functions of the variables | and y] are connected 
by the relation 


g (t) = == =- f (t) 


Since by hypothesis / (t) is real, it follows that f(t) — f (0 and, hence, 

g(t) = fit) 

From the uniqueness theorem we now conclude that the distribution 
functions of the variables | and q coincide, that is, that 

f (;c)= I—F(—x+O) 


and the proof is complete. 


Theorem 3. If a characteristic function | f(t) | is integrable over 
the entire line, then the corresponding distribution function F (x) 
is absolutely continuous, its derivative p{x) is continuous and 


p(x} = F' = 

— 00 

Proof. If the function f(t) is summable over the entire line, 
then the function -- r-^ - f{t) has this property as well, and 

for this reason the inversion formula may be written as 


00 






■itx. 


It 


f {t) dt 


■ CO 


Now let k be such that x^~x—h and x^:—x-\~h are continuity 
points of F{x). After simple formal transformations we arrive at 
the equation 


00 


sin th 
th 


F(x + h)-P{x-h) = 2h-^ J 
^ I, it follows that 


(4) 


— GO 


F(x + k)-F(x~h)^2h.I^ I \f(t)\dt 

— CO 

This last inequality obviously proves that F {x) is absolutely 
continuous. 

Now (4) may be expressed as follows: 


F{x-{-h) — F{x—h) i p sin th 

2h 2si J th 



( 5 ) 



230 


Chap. 7. Characteristic Functions 


Since the integrand converges to (i) as /i—►O, it follows 

from the well-known Lebesgue theorem, on passing to the limit 
under the integral sign, that 

— 00 — 00 

Since the limit of the right-hand side of Equation (5) exists, 
the limit of its left-hand side also exists. Thus, for every value of x, 

pn- |ta j 

— 00 

By straightforward computation it follows that 

00 

|p(x + /i)—J (t)\dt 

— 00 

Evaluate the integral on the right-hand side. To do that write it 
in the form of a sum: 


n 



\f(.t)ldl + ^ J jsin-^j|/(0|d/ 

Mt>^ 


Let c > 0 be given. Choose A so large that 

± j \f(t)\dt<-^ 

in> A 

The first integral may be made less than e/2 by choosing h sufficiently 
small. This completes the proof of the theorem. 


Sec. 37. Hellys Theorems 

In the sequel we shall require two theorems of a purely analytic 
nature: the first and second Helly theorems. 

Let us agree that a sequence of nondecreasing functions 

(■^) f Tz (Jf) t • • •» Tn (jc), ... 

converges weakly to a nondecreasing function F (a:) if as n-^oo it con¬ 
verges to this function at every one of its points of continuity. 

Henceforth we will always assume that the functions (x) satisfy 
the supplementary condition 

/^n(—oo) = 0 

and will not mention this fact each time. 



Sec. 37. Belly's Theorems 


231 


We straightway note that for weak convergence it is sufficient 
that the sequence of functions converge to the function F (x) on some 
everywhere-dense set D, Indeed, let x be any point and x' and x" 
be some two points of the set D such that x' ^x^x". Also that 

Pni^') < Pni^) < PAXI 

Consequently, 

lim F„ {x') < lim F„ (^) ^ lim F„ (^) ^ lim {x") 

n -* c» n-* CD n -*■ CD n -*■ ao 

And since by hypothesis 

lim F„(x') = F(x') and lim f„(;c") = F(;c") 

n -*■ CD n CD 

it also follows that 

F (a:') < Hm F„ (x) ^ lim f „ (x) < F {x") 

rTTlx, n -* CD 

But the middle terms in these inequalities do not depend on x' 
and x'\ and so 

F {x —0) ^ lim F„(A:)^lim F„(x)^F {x-\-0} 

n -* CD n -»■ CD 

If the function F (x) is continuous at the point Xy then 

F{x—0) = F{x)=Fix + 0) 

Consequently, at continuity points of the function F (x), 

lim F„{x) = F {x) 

n -* CD 

Helly’s First Theorem. Any sequence of uniformly bounded non- 
decreasing functions 

FjiX)y F^{X)y ..., F„{X)y ... (l) 

contains at least one subsequence 

C-^)* P (■^)» • • • » Pnjf (•^)* • • * 

that converges weakly to some nondecreasing function P {x). 

Proof. Let D be some countable everywhere-dense set of points 
x\y x'^y ...yXny.... TaliB tho values of the functions of the se¬ 
quence (1) at the point x[: 

Piix'i), F,{x'i), .... F„(xi), ... 

Since, by hypothesis, the set of these values is bounded, it con¬ 
tains at least one' subsequence 

fnW). PtAx'i). •••. Finix'i), ••• (2) 



Chap. 7. Characteristic Functions 


m 


that converges to a certain limiting value, which we denote by 
G(jfi). We now consider the set of numbers 


^11 (•^?)» ^12 (-^ a )* 


Pm W). 


Since this set is bounded as well, there exists in it a subsequence 
that converges to some limiting value G (xi). Thus we can extract 
from the sequence (2) a subsequence 

Pn (^)» ^22 ix)r ..., {X), ... ( 3 ) 

for which simultaneously lim (xi) = G (Xj) and lim F^ (Xj) — G (Xj). 

n-t-cp n-*a3 

We continue this extraction of subsequences 

Fhi (-X), (x), ...» Ff^ (-X), ... (4) 

for which the equations \xm Fff„(Xr) — G {Xr) hold simultaneously 

n-*-cD 

for all r^k. Now construct the diagonal sequence 

FiAx). . F„„(x), ... ( 5 ) 

The whole of it has ultimately been extracted from the sequence (1), 
and SO for it limjF„„(x'i) = G(xi). Further, since the entire dia- 

tl-* 00 

gonal sequence, with the exception of the first term only, has 
been extracted from the sequence (2), it follows that limF„„(x 2 ) = 

rt -*• op 

= Gixl). Generally speaking, the entire diagonal sequence, with 
the exception of the first k —1 terms, has been extracted from 
the sequence (4), and so for it, too, the lim(x^ = G (xi) holds 

rt-^oo 

for every k. The result may be formulated thus: the sequence (1) 
contains at least one subsequenee which converges at all points 
Xf^ of the set B to some function G (x) defined on D. And since 
the functions F„„(x) do not decrease and are uniformly bounded, 
it is obvious that the function G(x) as well will be nondecreasing 
and bounded. It is now clear that the function G (x) defined on 
the set D may be eontinued so that it will be defined over the 
entire line — oo<x<oo, while remaining nondecreasing and 
bounded. 

The sequence (5) converges to this function on the everywhere- 
dense set D\ hence, it converges weakly to it, which is what we set 
out to prove. It will be noted that the function obtained by continuing 
the function G may prove not to be continuous from the left. But 
we can change its values at the discontinuity points so as to restore 
this property. The subsequence Fnn will converge weakly to the thus 
“corrected” function. 

Helly's Second Theorem. Let f {x) be a continuous function and lei 
the sequence of nondecreasing uniformly bounded functions 

Fi (x), F 2 (x) , .. •, Fn (x), •. • 





Sec. 37. Hetty's Theorems 


233 


converge weakly to the function F(x) on some finite interval a^x^b, 
where a and b are continuity points of the function F (x); then 


b b 

lim J f (x) dF„ (^) = J / (x) dF (x) 

a a 


Proof. From the continuity of the function f(x) it follows that for 
any positive constant e there will be a subdivision of the interval 
by the points XQ—a^ Xi, ... y Xj^=b into subintervals 
Xk^x^Xh+i such that in each interval {Xhy JCfe+i) the inequality 
\f{x)—f{xk)\<.e^ will hold. Taking advahtage of this circumstance, 
we can introduce an auxiliary function /, (x) that takes on only a finite 
number of values and define it by means of the equalities 

fAx)=fM for Xfe<X<Xft+i 
Clearly, for all x in the interval a^x^b the inequality 

1/ (4—/eWl <8 

holds. In doing so we can select beforehand the subdivision points 
Xi, Xg, ...» so that they will be continuity points of the function 
F{x). By virtue of the convergence of the functions Fi{x), Fa(4. 
Fz{x), ... to the function F(x), the following inequalities will hold 
at all subdivision points for n sufficiently large: 

\F(x,)-FAx,)\<^ ( 6 ) 

where M is the maximum of the absolute value of f(x} in the 
interval a^x^b. 

It is clear without explanation that 


^f{x)dF{x)-'^f{x)dF„ix) 






i f {x}dF {x)— 5 /. {X) dF (x) 


+ 



It is easy to compute that the first summand on the right-hand 
side does not exceed e [F (b)—F (a)] and the third does not exceed 



234 


Chap. 7. Characteristic Functions 


8 [Fn{h) — F„{a)]. But the second summand is found to be equal to 

2 f(.Xk)[F{Xt+i)—P{Xi,)]—'Z f{Xi,)[F„{x^^.^)—F„(Xt)] = 
k=0 k=0 

N—l N—1 

= 2 / (Xt) [P (-Xt+i)—-f, (**+ 1 )] — 2 / (Xjt) IP (Xi)—P„ (Xj)] 

k = 0 A=*0 

and, consequently, for n sufficiently large it does not exceed 2e, 
as follows from the inequality (6). By virtue of the uniform 
boundedness of the function F„(x), the sum 

e [f (6)—f (fl)] 4- e [f „ (6)—f „ (a)] + 28 

can be made arbitrarily small together with e. 

The Generalized Second Theorem of Helly. If the function f {x) 
is continuous and bounded over the entire line — 00 <,x<^ o, the 
sequence of uniformly bounded nondecreasing functions 

Fi (■^), F^ (•^), • • • > F (x), .. • 

converges weakly to the function F (x) and 

limF„( — 00 ) = F{ — 00 ), lim 00 ) = 00 ) 

it follows that 

Urn f f {x)dF„{x)=\f (x) dF (x) 

n^cD ^ 


Proof. Let ^4 < 0 and B > 0; we put 


J,= 

J.= 




5 f(x)dP(x)— J f{x)dP„{x) 


— OD 

B 


—00 


B 


J / (X) dF{x — If (X) dF„ (x) 


A 

CO 


A 

00 


J / {X) dP (x) —\f{x) dP„ {x) 

B B 


It is obvious that 


I J /(x) dF {x)—lf (x) dF„ (x) I< J, + + / 


The quantities and may be made arbitrarily small if one 
chooses A and B sufficiently large in absolute value and also such 
that the points A and B are points of continuity of the function 
F (x), and by choosing n sufficiently large. Indeed, let M be the 



Sec. 38. Limit Theorems for Characteristic Functions 


235 


Upper bound of |/(a:)| for —oo<a:<oo; then 

A + /=•„(/!)] 

<M [/• (+ oo)-F (B)] +M[F„( + oo)-F„ (B)] 

But 

lim B(y4) = 0, lim f (B) = B( + oo) 

—00 

And since by hypothesis 

lim F„ (A) = F (A), lim F„ (B) = F (B) 

rt ->00 rt -*- Q 0 

our assertion about and is proved. For n sufficiently large, 
the quantity may be made arbitrarily small by virtue of Helly’s 
theorem for a finite interval. 

The theorem is proved. 

Sec. 38. Limit Theorems for Characteristic Functions 

From the point of view of the applications of characteristic functions 
to the derivation of asymptotic formulas of probability theory, two 
limit theorems (direct and converse) are of prime importance. These 
theorems state that the correspondence existing between distribution 
functions and characteristic functions is not only one-to-one but also 
continuous. 

The Direct Limit Theorem. If a sequence of distribution functions 

Fi{x), Fa (a:), ..., Fnix)y ... 

converges weakly to the distribution function F(x), then the sequence of 
characteristic functions 

hithhit), ...Jnith 

converges to the characteristic function f{f). This convergence is uniform 
in each finite interval of t. 

Proof. Since 

fn (0 = i eK^dF^ (X), f(i)= I e'^’dF (x) 

and the function e^^^ is continuous and bounded over the entire line 
—oo</<oo, according to the generalized second theorem of Helly, 
as n-^oo, 

fn(t)-f(f) 

The assertion that this convergence is uniform in every finite in¬ 
terval of t is verified literally by the same arguments used in proving 
Helly’s second theorem. 



236 


Chap. 7 . Characteristic Functions 


The Converse Limit Theorem. If a sequence of characteristic functions 

/iWJaW. JnW. ... (1) 

converges to the continuous function f{t), then the sequence of distribution 
functions 

F^{x), F^{x), ..... (2) 

converges weakly to some distribution function F{x) ^by virtue of 

the direct limit theorem f{t)=^e^^^dF{x)^. 

Proof. On the basis of Helly’s first theorem we conclude that 
the sequence (2) definitely contains a subsequence 

F„^{x), ... (3) 

which converges weakly to some nondecreasing function F (x). It is 
clear in this case that the function F (x) may be considered conti¬ 
nuous on the left: 

lim F {x') = F {x) 

X' -* x-Q 

Generally speaking, the function F (x) need not be a distribution fun¬ 
ction, since for this to be the case the conditions F{ —oo)=0 and 
F(-foo)=l must also hold. Indeed, for the sequence of functions 

0 for — n 

Y lor — n <i x^n 

I for x > n 

the limit function is F(a:) = -^, and, consequenty, F(-~-oo) and 

/'( + oo) are also equal to-^. However, as will now be shown, 
under the conditions of our theorem we will definitely have 

F {—oo) — 0 and F (-f oo) = 1 . 

Indeed, if this were not so, then, taking into account that for 
the limit function F (%) the relations F {— oo) ^ 0 and F (+ oo) ^ I 
must hold, we would have 

6 = F (-f oo)—F (— oo) < I 

Now take some positive number e less than 1 —6. Since by hypo¬ 
thesis the sequence of characteristic functions ( 1 ) converges to the 
function f {t} it follows that /(0) =l. And since also the function 
f{t) is continuous, one can choose a positive riumber x so small 




Sec. 38. Limit Theorems for Characteristic Functions 


237 


that the inequality 


2t 


j f (0 dt 


-X 


> 1 -|> 6 +| 


(4) 


will hold. But at the same time we can choose and K so 

To 

large that for k'>K 

Since tnAt) is ^ characteristic function, it follows that 

T 

5 



The integral on the right-hand side of this equation may be eva 
luated as follows. On the one hand, since 


X 

J dt 

-X 

On the other hand, 

X 

dt — ~ sinxx 

X 

-X 

and since | sin ta: | ^ 1, for [ a: | > X 

t 

J e^^dt 

-X 





From this, using the first estimate for |a;|:<X and the second for 
|x| > X, we get 


5 f.,i.t)di 

< 

f ( f e"* dt) dF„^ (x) 

-T 




+ 


+ 


!jc!>A-\-t / 


dP„, (>c) 


< -f- 


X 


and, hence. 




-t 


J_ 

2t 


<6 + 1 



238 


Chap. 7. Characteristic Functions 


This inequality continues to hold in the limit as well: 


2t 


5 f it) dt 


<6 + 


-t 


8 

2 


which obviously contradicts inequality (4). 

Thus, the function F (x), to which the sequence F„^(x) weakly 

converges, is a distribution function; by the direct limit theorem 
its characteristic function is /(/). 

To complete the proof of the theorem it remains to prove that 
the entire sequence (2) also converges weakly to the function F (x). 
We assume that this is not so. Then there will be a subsequence 
of functions 




F 4 


(5) 


that converges weakly to some function F* (x) different from F (x) 
in at least one of its points of continuity. From what has already been 
proved F* (x) must be a distribution function with the characteristic 
function f(t). By the uniqueness theorem we should have 

f* {x)=Fix) 


This contradicts the assumption. 

We note that the hypothesis of the theorem is satisfied in each of 
the following two cases: 

(1) The sequence of characteristic functions /^(Z) converges to some 
function f{t) uniformly in every finite interval of t. 

(2) The sequence of characteristic functions /„(/) converges to a 
characteristic function f{t). 

Example. To illustrate the use of limit theorems we consider the 
proof of the integral theorem of DeMoivre-Laplace. 

In Example 4, Sec. 35, we found the characteristic function of 

the random variable 1 ^ = 

fn it) = ( 

Taking advantage of the expansion in a Maclaurin series, we find 

-ifl/-F— it 1 / -i_ (2 

qe V ncij^p^ V 


jLi — np 

V n-pq 


qe 


- it 1/^ — it 1/^ 

Y nq Y 


Q 

np 


D _o V ^ y- 


where 



Sec. 39. Positive Definite Functions 


239 


Since R„—*0 as n —►oo, it follows that 

ft 

By virtue of the converse limit theorem^ it follows from the fore¬ 
going that for every x 

— 00 

as n-^oo. 

From the continuity of the limit function it is easy to deduce that 
this convergence will be uniform in x. 

Sec. 39. Positive Definite Functions. 

The purpose of this section is to give an exhaustive description 
of the class of characteristic functions. The basic theorem given below 
was discovered by A. Ya. Khinchin and S. Bochner at the same time 
and was first published by S. Bochner. 

To formulate and prove this theorem we have to introduce a new 
concept. We will say that the continuous function /(/) of the real 
argument t is positively defined in the interval —oo<c^Coo if, for 
any real numbers • • •, complex numbers ^ 2 , • • •. 

and integer n, 

2 i / (^/—(^) 

k=i /=i ^ ’ 

We list a few of the most elementary properties of positive 
definite functions. 

1. f(0)^0. Indeed, put / 2 = 1 , ^i = 0, li = l; then from the 
condition of positive definiteness of the function f{t) we find 

fe=l /=! 

2. For any real /, 

f{~ 0 = 

To prove this, in (1) put n = 2, /i=0, 4 = /, and 1^, I 3 arbit¬ 
rary. By hypothesis we have 

o< i: i: f «»-</) 1*1/= 

ft=i /=i ‘ * 

=/ (0 - 0 ) 1 , 1 .+f (0 - 1 -) +/ (< - 0 ) 1 , 1 ,+ f (t -t) 1 , 1 , = 

= /(0)(11.1 > + II. 1 “) + /(- 0111. + f(t) 111. (2) 



240 


Chap. 7. Characteristic Functions 


and so the quantity 

n-t)U2+fit)U. 

must be reaL Thus, if we put /(—/) =aj-f t’Pi, /(/) = +/Pg, 

= V + 1 i ^2 = 7 —^hen it must be that 

ai6 + PiV—agb + p^v = 0 

Since and and, hence, y and 6 are arbitrary, it must be that 

aj— OL^ — 0 and Pi-fP 2 = 0- 

From this our assertion follows. 

3. For all real t 

l/(OI</(0) 

In inequality (2) put E 3 = —1/(0|; then from the pre¬ 

ceding result 

From this we get, when \f{t)\¥=0, 

f(0)>\f{t)\ 

But if 1/(01 = 0* ^hen again by virtue of Property 1, we have 

/(O)>|/( 01 . 

From what has been proved it follows incidentally that if a posi¬ 
tive definite function is such that /(O) —0, then f{t)=0. 

Bochner-Khinchin Theorem. For a continuous function f (t) satisfying 
the condition / (0) = 1 to be characteristic, it is necessary and suf¬ 
ficient that it be positive definite. 

Proof. In one direction the theorem is trivial. Indeed, if 

/ (^) = 5 e'*' dF (x) 

where F (jc) is some distribution function, then for any integral n, 
arbitrary real 0 * 0 * •••» 4 * complex numbers |i, ..., 

we have 

2 ^ f Ok—t/Hi/= 'Z il {S «'■' (X )} IJ/ = 

= 5 2 2 dF (x) = 

k=i /=i ^ 

= S (e“**l*) (2 e-‘‘j4,^dF (x) = ^ 2 dF (;c) ^ 0 



Sec. 39. Positixje Definite Functions 


241 


The sufficiency proof requires a more involved reasoning. 

The proof given here is taken from Yu. V. Linnik’s book. To a 
considerable extent it relies on Theorem 3, Sec. 36, the limit 
theorems for characteristic functions, and on the following lemma. 

Lemma. If the function q)(^) is measurable, bounded and summable 
on the interval { —T, T) and 

T 

f{x)= ^e-outfit) dt^O (3) 

- r 


then the function f {x) is integrable on the entire line. 

Proof. The function f{x) being continuous, it is integrable over 
any finite interval. Put 

X 

G(x)= J f{z) dz 

-X 

By virtue of the nonnegativity of f{z) the function 0{z) is non¬ 
decreasing, and so to prove the lemma it suffices to demonstrate 
that G(.^) is bounded. For this purpose consider the function 

F (u) — J G{x)dx 

u 

Clearly, 

tu 

f f dx = G(u) 


and, consequently, if it is shown that the function F{x) is bound¬ 
ed, then that will prove the lemma. 

It is easy to verify that 


and 


G{x) = 2 J 

-T 


T 

4 f sin® X/ 

= — J —ir-^>{t)di 

-T 


£ 

X 


T 


- T 





^{t)dt 


Let M 


= sup 1 9 (x) I; then 

T 

4 r 

CT J 
- / 



sin- u 

~W~ 


du 



242 


Chap. 7. Characteristic Functions 


It is obvious that 

— T — 00 

The boundedness of the functions F {x) and G (x) is proved, 
and this proves the lemma. 

Now assume that the function f{t) is positive definite, continuous 
and /(0)==1. Let Z > 0. We consider the function 

z z 

0 0 

The function pz{x) must be nonnegative by virtue of the positive 
definiteness of f {t) (the double integral is the limit of the corres¬ 
ponding sums). If in the double integral we make the change of 
variables 

t = u —u, z = u 

and perform elementary transformations, it will be found that 

z 

- z 

Since the function pz{x) is nonnegative and representable as (3), 
the lemma just proved may be applied to it. And so pz {x) is 
integrable over the entire line. The function f{t) is continuous 
and therefore from Theorem 3, Sec. 36, it follows that 

00 

(l—^)/(0= J Pz(x)e“’‘dx 

— 00 

for all /(|/|^Z). In particular, for / = 0 

00 

J P2{x)dx = f{0)=^l 

^ CO 

Thus, is the density of some probability distribution and so, 

(l—/(O is the corresponding characteristic function. For 

Z—^oo, the functions 

(l— 

uniformly converge to the function f{t) in every finite interval of t. 
From this it follows that f{t) is a characteristic function. 

This completes the proof of the theorem. 



Sec. 40. Characteristic Functions of Random Variables 


243 


Sec. 40. Characteristic Functions of Multidimensional Random 
Variables 

In this section we give, without proof, basic information con¬ 
cerning the characteristic functions of multidimensional random 
variables. 

The characteristic function of an n-dimensional random variable 
(^ 1 . •••» ^n) is defined as the expectation of the variable 

■■■ +inln)^ where t^, t^, ..., t„ are real variables: 

fi^i, h .U = iVlexp(^ii; (1) 

If F {x.^, Xa, ..., a:„) is the distribution function of the variable 
(El, Ea* E«), then, as we know from the preceding result*, 

/ (/j, J ... ^ (exp i J] t^x^ dF {x^, .. ., x„) (2) 

As in the one-dimensional case, the characteristic function of an 
?z-dimensional random variable is uniformly continuous over the 
entire space (—oo<^y<-j-oo, l^j^n) and satisfies the follo¬ 
wing relations: 

/(O, 0, 0) = I 

1/(^1* ^2» • • • > ^n) I ^ ^ ^ k = 2, ...) 

f{~ti, —t^, ..., — U = •••. U 

From the characteristic function f{ti, t^, ..., 4) the random 
variable (|i, it is easy to find the characteristic function 

of any ^-dimensional {k<^n) variable (1/^, whose 

components are the variables (I^s^ai). To do this, in formu¬ 
la (2) one has to put equal to zero all arguments tg for 
(l^r^^). Thus, for example, the characteristic function of the 
variable is 

0 , ..., 0 ) 

It follows from the definition that if the components of the 
variable (|i, are independent random variables, then 

its characteristic function is equal to the product of the characte¬ 
ristic functions of the components 

h . tn)=‘f\(tl)fi(h) ••• fn(tn) 


* Compare Theorem I, Sec. 27, and the remark on multidimensional Stiel- 
tjes integrals in Sec. 26. 



244 


Chap. 7. Characteristic Functions 


Just as in the one-dimensional case, multidimensional characte¬ 
ristic functions make it easy to find moments of various orders. 
Thus, for example. 




= 5 ... xl’dF{x^, x^, Jc„) = 


= (<•) 


2 '‘i 


dt\>dt\' ... 


ti — ti—... —<n=0 


For the computation of characteristic functions it is useful to 
know the following theorem which the reader will be able to prove 
without any difficulty. 


Theorem If the characteristic function of a variable (Ij, ..., 
is equal to /(^j, 4* • • •» 4)» l^e characteristic function of the 
variable (ffili + ai, ..., where a^ and 

a,-(l^i^n) are real constants, is 

oA . <^Jn) 

Example 1. Let us calculate the characteristic function of a two- 
dimensional random variable distributed according to the normal law: 

" 2nVJ±7i “P { ~ 2(iL^) } (3) 

By formula (2), 

/(4. = H y)dxdy 


Changing variables we can reduce /{4, 4) to the form 


f {ti, t^) = e 




2n 


H 


~ (e»+o*) 

r dudv 




Example 2. Applying Theorem 1, we find the characteristic fun¬ 
ction of the variable {r\^, iia) which is distributed according to the 
normal law: 


P{x. y) 


2naiUs}^ I — r^ 


X 


X exp 


i 


2<i—r^) 


(X— a)" 


a) (y—b) 
0ia2 


't* 


2 


(4) 


If we put then the variable (l^, y 

will be distributed according to the law (3). According to Theorem 1, 



Sec. 40. Characteristic Functions of Random Variables 


245 


the characteristic function of the variable (%, is 

y = exp + +2010/44 + 01^1) 

The following theorem is a consequence of the definition of a 
characteristic function. 

Theorem 2. If 4» •••» ^n) characteristic function of 

the variable (^i, I 2 , ..., then the characteristic function of the 
sum +12 + • • ■ + In 

f{t) = f{t, 4 t) 

Note. We notice that 

fit)=f{tty_, tt^, ..ttj 

is the characteristic function of the sum 4^i + ^ 2^2 + • • • + 

Example 3. We apply Theorem 2 to determine the distribution 
of the sum 'Hi + iria if (%> '* 12 ) is distributed in accordance with the 
law (4). 

According to Theorem 2 the characteristic function of the sura 
'Hid -^2 is 

/ (4 = exp it {a + b)—j (o® + 2ro^G^ + o|)j 

We know from Example 1 of Sec. 35 that this is the characteristic 
function of the normal law with expectation a+b and variance 
a|+2rcria2+a|. Earlier, we obtained this result directly (Example 2, 
Sec. 24). 

At the beginning of this chapter we saw that the characteristic 
function of a sum of independent random variables is equal to the 
product of the characteristic functions of the summands. We shall 
show that this property is only a necessary but not a sufficient con¬ 
dition for the independence of random variables. For this purpose con¬ 
sider the two-dimensional random variable (|, p), whose density fun¬ 
ction may be expressed as 

p (x, y)=pi (x) P 2 (^)+q> (x) 4>{i/)—qp(^)4’(^) 

where piix) and p^iy) are one-dimensional density functions, and 
q)(x) and ^jCjc) (cp (;c)+4i (a;)) are odd integrable functions. It is easy 
to see that such density functions exist. Indeed, the function 

p (x, p) = 1 e- 1 {1 + xye~ ^ w - !^1 — “ 2 !i^i} 

is just such an instance. It satisfies the inequality p{x, ^)>Ofor 
ail X and y and, moreover, 

QO » 

I I p{Xy y)dxdy=l 



246 


Chap. 7. Characteristic Functions 


The randcin variables ^ and t| are dependent since their joint 
density function cannot be expressed as the product of two factors, 
each of which depends only on one argument x or y. The density 
function of the component I is 

CO 

pe W= \pi^’ y)dy=pAx) 

— 00 

and the density function of the component rj is 

00 

Pr, iy)= Ip (jf- y)dx = (y) 

— 00 - 

The two-dimensional characteristic function for the vector (|, t]) 
is equal to 

fit, •C) = /l(0/2W + 

00 00 . 00 00 

+ J J (p (x) ij; (y) dydx — J ^ ^nx+ixy ^ dxdy 

— CO— 00 — 00 — 00 

where 

CO oo 

h (t) = S Pi ix) dx, (t) = 5 e“y pM dy 

— oo — 00 

In our particular example, the characteristic function of the vector is 

+ [(4q.^2)(9_^T;2) —(9q_;2)(4_}_T;2)_ 

The characteristic function of the sum l+r] is, according to 
Theorem 2, equal in the general case to 

fit, t)=fiii) hit) 

that is to say, it is equal to the product of the characteristic functions 
of the summands. We have thus shown that there exist independent 
random variables for which the characteristic function of the sum is 
equal to the product of the characteristic functions of the summands. 

It is important to note that in the multidimensional case the fol¬ 
lowing theorem holds. 

Theorem 3. A distribution function F {Xx, x^, ..., x„) is uniquely 
determined by its characteristic function. 

The proof of this proposition is based on the inversion formula. 

Theorem 4. If f {tx, U, ^, tn) is the characteristic function and 
F (xx, Xz, ..., Xn) is the distribution function of the random variable 



Sec. 40. Characteristic Functions of Random Variables 


247 


(il» i2» 

= lim 

T-*-cd 


..., In), then 

<b^, k = 1, 2, 

T T T 

r 4 r-In 


(2ny 


-T -T 


_7-fe = l 


..., n} = 

pitkak_ pit kbit 

-T--/(/j, tn)dt^dt^. . .dtn 


where and are any real numbers that satisfy only one requirement: 
that the probability of falling on the surface of a parallelepiped a^^ 
^lk<^bh {k=\, 2, ..., n) be equal to zero. 

Just as in the one-dimensional case, we have the direct and the con¬ 
verse limit theorems for the characteristic functions. We shall not 
dwell on this. 


Example 4. One says that an n-dimensional random variable 
(L ^ 1 . • • •» En) has a nondegenerate (proper) n-dimensional normal 
distribution if its density function is of the form 


where 


. \ /-» ~ o ^ 2 ’ •*•«) 

p(Xi, x^, xJ = Ce 2 

Q • • •» ~ ^/) (-^y ^/) 

i. / 


Is a positive definite quadratic form, and C, and bjj are real 
constants. 

Simple computations demonstrate * that 


where 


C = (V2nY"V D 


bii 


• * • ^In 

^21 

^22 

. . . b^n 

• 

b„i 

• 

• • • 

• • • ^nn 


Denote by Dy,- the minor D, which corresponds to the element 6/,-, 
then 


ME/ = n;. = = ^ (/ 


Oy = 


'/ 

M (li—ai) ilj— aj) 


D 


if 


OiOf 


V DuDj^ 


\ 2, n) 

(i, / = 1, 2, ..., n) 


The determinant D and its principal minors are positive. 


* The usual procedure for such computations is to change variables, which 
reduces the form Q to a sum of squares, and to carry out all the computations 
in the new variables. 




248 


Chap. 7. Characteristic Functiam 


Using ordinary computations it is easy to verify that the cha¬ 
racteristic function of the variable is equal to 


i 

f {^l» ^2’ • • • » ^ 


n n 

/=i ^*=1 


n 

o jOkr jittjtit 

/ = 1 


Thus, an n-dimensional normal distribution is completely determined 
by specifying the expectation and variance. 

From the expression for the characteristic function of an n-di¬ 
mensional normally distributed random variable we see that the 
distribution of the variable 

(ij,> ^/a» • • •» ^^/c) 

will, for all 1 ^ < /^ <... <4 be a ^-dimensional normal 

distribution. 


EXERCISES 

1. Prove that the functions 

CO 00 

h (0 = 2 = 2 

ft=o *=0 

00 

where and characteristic functions; determine the corre- 

fessa 

spending probability distributions. 

2. Find the characteristic function for the following probability density 
functions: 


(a) 


(b) /?(x) = 
<c) p(x) = 


(d) p{x)= 


a 


ji {a3 -j- x3) ’ 

( 0 when | x [ ^ a 

I a—\x\ 


a‘‘ 


when I x 1 ^ a; 


2 sin- 


ax 


nax‘ 


Note. The attentive reader will have noted that Examples (a) and (b) and 
also (c) and (d) are, so to speak, inverse. 

3. Prove that the functions 

9i (0 — ^ * Ta (0 — gj^ . Ts (0 — cosh3 1 



Exercises 


249 


are characteristic functions of the density functions 

1 , . Ji 


Pi {X) 


2 cosh 


JIX 


, Piix) 


4 cosh^ 


nx 


, Paix) 


2 sinh 


nx 


respectively. 

4. Find the probability distributions of random variables whose characteristic 
functions are 


/V j. /!_\ <> 1 / \ ^ /jv sin cit 

(a) cos t; (b) cos^ (c) ; (d) 

5. Prove that the function defined by the equations 

/(0 = /(- 0 . /a + 2 a )=/( 0 . /(0 = ^ forO<«fl 

is a characteristic function. 

Note. The characteristic functions of Examples 2 (d) and 5 possess the follow' 
ing remarkable property; 

= h (0 for M 

h (0 fb (0 for I /1 > a and ^ ± 2 a, ... 


There, thus, exist characteristic functions whose values coincide in an arbitra¬ 
rily large interval (—a, ^re not identically equal. The first instance 

of two such characteristic functions was pointed out by B. V. Gnedenko; then 
Krein indicated the necessary and sufficient conditions for which the ictentity 
of two characteristic functions follows from their equality in some interval 
(— H“^)- 

6 . Prove that one can find independent random variables |i, ^ 2 . la such that 

the probability distributions of I 2 and ^ are different, while the distribution 
functions of the sums and I 1 +I 3 are the same. 

Hint. Make use of the results of Examples 2 (a) and 5. 

7. Prove that if / (^) is a characteristic function equal to zero when 1 1 1 ^a, 
then the function q) (f) defined by the equations 


f (0 = 


1 


f(i) 

n‘-h2a) 


when I i | ^ a 
when — 00 < i <00 


is also a characteristic function. 

Hint. Make use of the Bochner-Khinchin theorem. 

8 . Prove that if / {t) is a characteristic function, then the function 

q) = 

is also a characteristic function. 

9. Prove that if the function f(t) is a characteristic function, then the fun¬ 
ction 

t 

q) (f) = - 7 - J / ( 2 ) dz 

0 

is also a characteristic function. 

10. Prove that for any real characteristic function q) (0 the inequality 



250 


Chap. 7. Characteristic Functions 


holds and, hence, for any characteristic function the inequality 

\-\tm 

also holds. 

11, Prove that for any real characteristic function the inequality 

l + ,),(2/)&2{,pOTp 

holds. 

12, Prove that if F {x) is a distribution and /(/) the corresponding character¬ 
istic function, then for any value of x the equation 

T 

lim -L [ f{t)e-^^^ dt=-F {x-^Q) — F{x—Q) 

T-*<x> 2T J 
-T 

holds. 

13, Prove that if F (x) is a distribution function, f (t) the corresponding char¬ 
acteristic function, and x.^ are abscissas of jumps in the function F (x), then 

T 

lim ‘ f |/(OP* = y'{f(^v+0)-f(J'v-0)} 

7-^00 J 

—7 V 


14. Prove that if a random variable has a density function, then its charac¬ 
teristic function tends to zero as oo. 

15 A random variable (|) is Poisson distributed; M|=A,. Prove that as 

£_ % 

X -»■ CX 5 . the distribution of the variable tends to the normal law for which 

V ^ 

the parameters a and o are 0 = 0, a=l. 


16. A random variable | has the density function 


( 0 


p{x)= \ 


JL 

r(a) 




for 

for a: > 0 


Prove that as a -> oo the distribution of the variable 


Pi-« 


converges to 


the normal distribution with parameters a = 0, a=l. 

Note. The results of Exercises 15 and 16 permit using tables of normal distri¬ 
bution when computing the probabilities P{a^|<6 j for large values of X 
(or a). In particular, it turns out that for a chi-square distribution the limiting 
relation gives excellent accuracy already for n^SO. This fact is constantly 
utilized in statistics. 

17. Prove that if q) (^) is a characteris ic function and the function a|5 {^) is 
such that for some sequence { } (A„ —> oo as n oo), the products 


^>it)^{h„i) = fn (0 


are also characteristic functions, then the function ijj (/) is a characteristic 
function. 



CHAPTBK g 

The Classical Limit Theorem 


Sec, 41. Statement of the Problem 


The integral limit theorem of DeMoivre-Laplace that we proved 
in Chapter 2 served as the source of a broad range of investiga¬ 
tions of fundamental importance both to the theory of probability 
itself and to its multiplicity of applications in the natural scien¬ 
ces, technology and the economic sciences. To give an idea 
of the trend of these investigations, we shall restate the DeMoivre- 
Laplace theorem in a somewhat different form. Namely, if, as we 
have frequently done, we denote by p* the number of occurrences 
of an event A in the ^th trial, then the number of occurrences 

n 

oi A in n successive trials is equal to 2 Further, in Exam- 

n n 

pie 5, Sec. 28, we computed that M 2 \^k = ^P ^nd d2 = 

k=\ k—\ 

= npq. Therefore, the DeMoivre-Laplace theorem may be written 
as follows: 


P 


f n 



2 (H* —Mp*) 
fe=l _ 

n 

V s 


< b 


j 



( 1 ) 


as n —oo. In words: the probability that the sum of deviations of 
independent random variables—which take on two values, 0 and 1, 
with probabilities respectively equal to q and p=\ — q (0<p<l)— 
from their expectations divided by the square root of the sum of 
the variances of the summands mil lie between the limits from a to 

b 2 * 


b tends to the integral 


V^2ji J 


^ dz uniformly in a and b as 


the number of summands increases to infinity. 



252 


Chap. 8. The Classical Limit Theorem 


The natural question arises: How closely tied up is the relation 
(1) with the special choice of summands pfe? Will it not hold in the 
case of weaker restrictions imposed on the distribution functions 
of the summands? The statement of this problem and also its solu¬ 
tion belong in the main to P. L. Chebyshev and his pupils A. A. Mar¬ 
kov and A. M. Lyapunov. Their investigations have shown that one 
should impose on the summands only the most general restrictions, 
the meaning of which depends on the fact that the separate summands 
should exert an insignificant effect on the sum. In the next section 
we will give a precise statement of this condition. The reasons why 
these results are so vastly important in applications lie in the very 
essence of mass-scale phenomena, the study of the regularities of which 
is, as we have already had occasion to say, the actual subject of the 
theory of probability. 

One of the most important schemes used to exploit the results 
of probability theory in the natural sciences and technology consists 
in the following. It is assumed that a process occurs under the influ¬ 
ence of a large number of independently operating random factors, 
each of which only to a negligible extent modifies the course of the 
phenomenon or process. The investigator who is interested in the pro¬ 
cess as a whole, and not the operation of separate factors, observes 
only the overall operation of these factors. We illustrate with two typi¬ 
cal examples. 

Example 1. Let a measurement be made. The result will unavoid¬ 
ably be influenced by a large number of factors that generate errors 
in the measurement. These will include errors due to the state of the 
measuring instrument, which might vary in gross fashion under the 
effect of various atmospheric or mechanical factors. There will also 
be human errors of the observer caused by peculiarities of vision or 
hearing and also those that might be altered slightly due to the psy¬ 
chic or physical state of the observer, and so forth. Each of these fa¬ 
ctors would generate a negligible error. But the measurement is affe¬ 
cted at once by all these errors, the result being an “overall error”. 
In other words, the actually observed error of measurement will 
be a random variable—the sum of an enormous number of negligibly 
small and independent random variables. And though these quantities 
are unknown, as also are their distribution functions, their effect on 
the results of the measurements is noticeable and for this reason must 
be the subject of study. 

Example 2. In many industries large batches of identical articles 
are produced by the mass-production process. Let us consider some 
numerical characteristic of the product we are interested in. Insofar 
as the article conforms to certain technical standards, there is a cer¬ 
tain standard value of the characteristic we have chosen. Actually, 
however, there is always observed a certain deviation from this stan- 



Sec. 41, Statement of the Problem 


253 


dard value. In a properly organized production process, such devia¬ 
tions can only be caused by random factors, each of which produces 
only an unnoticeable effect. The overall action, however, generates 
a noticeable deviation from the norm. 

Any number of such instances might be cited. 

Thus, there arises the problem of studying regularities peculiar 
to sums of a large number of independent random variables, each 
of which exerts but a slight effect on the sum. Later on we will make 
the meaning of this requirement more precise. Instead of studying 
sums of a very large but finite number of summands, we will consider 
a sequence of sums with ever larger numbers of summands and assume 
that the solutions of the problems we are interested in are given by 
limiting distribution functions for a sequence of distribution func¬ 
tions of the sums. This kind of passage from a finite statement of the 
problem to a limiting statement is customary both in modem mathe¬ 
matics and in many divisions of the natural sciences. 

We have thus arrived at a consideration of the following problem: 
given a sequence of mutually independent random variables 

El> ^2* • • • » • • • 

about which we suppose that they have finite expectations and 
variances. From now on we will adhere to the following no¬ 
tations: 

fe=I k=l 

The question is: what conditions must be imposed on the varia¬ 
bles so that the distribution functions of the sums 

n 

( 2 ) 

” *=1 


converge to the normal distribution law? In the next section we 
will see that for this purpose it is sufficient that the Lindeberg 
condition be satisfied: for any t > 0 


lim 

n -► as 


n 


b~ 


2 f (•«—= 0 

U _ 1 . V _, 


J x-au |>tS« 


where F^ix) denotes the distribution function of the variable 
Let us clarify the meaning of this condition. 

Denote by ^^.the event consisting in the fact that 


%k 1 ^ xFjj {k 1, 2, ..., n) 



254 


Chap. 8. The Classical Limit Theorem 


and estimate the probability 


Since 


and 


max 

1 < fe < n 


P { max I | > — P {^4^ ^2 ”1“ ■ ^n} 

1 < ft < n 


P {Ai -f- + . . . + A„} ^ P { } 

ft = 1 


by noting that 


P{'4ft}= J dPtix)^ 


I 


(X B„y‘ 


J (x—atfdPkix) 


I x-ait I > xBn 


I x-ait I > xBn 

we find the inequality 

I " C 

P{ max |5j—2- J (x—a^)^dF^(x} 

1 < ft < n " fe= I] jc-cfe { > xBn 

By virtue of the Lindeberg condition, the latter sum tends to zero 
as n— >-00 for any constant t > 0. Thus the Lindeberg condition 
is a peculiar kind of demand for the uniform smallness of the terms 

in the sum (2). 

Let us note once again that the meaning of the conditions that 
are sufficient for convergence of the distributions of the sum (2) 
to the normal law was fully elucidated already in the investiga¬ 
tions of A. A. Markov and A. M. Lyapunov. 

Sec, 42. Lyapunov's Theorem 

We begin by proving the sufficiency of the Lindeberg condition. 

Theorem. If a sequence of mutually independent random variables 
li, Ea. •••* ••• constant t>0 satisfies the Lindeberg 

condition 


lim 

n <x B 


tZ ]■ 


{x — a^Y dF„ {x) = 0 


I x-ait I > xB^ 


then as n —^00 


( " 


P { IT L (I* — X I 

t=l } 


X 2 * 

1 , 
e dz 


y 23 x 




— 00 


uniformly in x. 


( 1 ) 


( 2 ) 



Sec. 42. Lyapunov's Theorem 


255 


Proof. For brevity we introduce the following notations: 

e _ — 

'ink D » 

‘-*n 

fnk (X) = P < X) 

It is obvious that 

M|„, = 0 

and, consequently, 

(2') 

k=\ 

It is easy to see that in these notations the Lindeberg condi¬ 
tion becomes 

lim y 5 x'^dF„^(x) = Q (!') 

The characteristic function of the sum 

n n 

" 1 k=\ 

is equal to 

n 

fnk(i) 

fe=! 

We have to prove that 

11 

\\m^„{t) = e ^ 

n-*a> 

For this purpose we first establish that the factors 
to 1 uniformly in k {l^k^n) as «—»-oo. Indeed, taking into 
account the equation = we find: 

/ni (^) — 1 = J («"* — 1 —Hx) dP„^ (X) 

Since for any real a* 

(3) 

* This inequality and a whole series of similar ones may be derived, for 
example, as follows. From the fact that 

a 

|e/a_i J gix dx a (a > 0) 

0 



256 


Chap. 8. The Classical Limit Theorem 


it follows that 

Let e be an arbitrary positive number; then, clearly, 

= S 5 x^dF„^(x)^s!>+ 5 x^dF„„(x) 

|jr|<8 l^l>8 l^|>e 

For n sufficiently large, the latter summand may, by (T), be 
made less than 8^. Thus, for all sufficiently large n and for t in 
any finite interval |^|^r, 

uniformly in k {\ ^k^n). From this we conclude that 

Ita /„t (t) = 1 (4) 

n -► ao 

uniformly in and that for all sufficiently large n, 

for t lying in an arbitrarily finite interval \i\^T, the following 
inequality holds: 

l/»»W-l|<4 (5) 

We can therefore, in the interval write the expansion (log 

represents the principal value of the logarithm) 

iog<p„(0=i; iog^»t(o=i; iog[i+(/„t(o—1)]= 

k=] k=i 

=2 (6) 

fe = l 


we have the inequality 

1—ial = 


a 




l)dx 




From the latter inequality it follows that 


a‘ 


g/a—l—ta-j-— = 


a 


^ (e'* — 1 — ix) dx 


a a 

J I — 1 — ix\dx^ ^^dx = 

0 




(S') 


and so forth. 



Sec. 42. Lyapunov's Theorem 


257 


where 


n 00 


2: (0-1) 


k=l s=2 ® 


By virtue of (5) 


n 00 




k-i s=2 


1 -- lfnk(t)-U^ 


= 7E 




fe=i fe=i 




Since 


k=i 


k=i 


2 i (0 -11 = 2 5 («“* -1 - ‘tx) dF„, w 




n 


k=\ 


2 


it follows that 


^2 


|/?„|<y max|/„s(;) —11 




From (4) it follows that in an arbitrary finite interval 
as n tends to infinity 

Rn — ^ (7) 


uniformly in t. But 


where 


'^{fnk(^) 0— —2 

k~i 


P" = T + E I (X) 

k=\ 


( 8 ) 


Let 8 be any arbitrary positive number; then by (2') 

P« = t, S (e'«-l-«7A:-<^)df„,W + 

fe=l !;cl<e 

+ E I {~+e‘'‘‘-\-Ux)dF„,(x) 

*=l iJCl>S 



258 


Chap. 8. The Classical Limit Theorem 


The inequalities (3) and (3') permit obtaining the following esti¬ 
mate: 

J \x\=dF„,(x) + l^ 

k=l \K\<e k=l i^!>e 

i + J x^dF„„(x)== 

fe=l|;e|<E fe=lix|>e 

= iyie+/^(l—^f,)^ J ’d^dF„^(x) 

fe=l ltl>e 

According to the condition (T), the second summand may be 
made less than any q > 0 for any e > 0, so long as n is suffici¬ 
ently large. And since 8 is an arbitrary positive number, we can 
select it so small that, no matter what q>0 and T, for all t 
within the interval the following inequality will hold: 

I p„ I < 2ri (ft > rto (e, Ti, T)) 

This inequality shows that 



X j dFnk (x) < 


limp„ = 0 (9) 

/?-> CO 

uniformly in every finite interval of values of t. Collecting together 
the relations (6^, (7), (8) and (9), we finally find that 

lim log(p„(0 = — Y 


uniformly in every finite interval of t. The theorem is proved. 

Corollary. If the independent random variables .. . 

are identically distributed and have a finite variance different 
from zero, then as n tends to infinity 







2 dz 


uniformly in x. 

Proof. It suffices to verify that the Lindeberg condition is sa¬ 
tisfied under the given assumptions. For this purpose, we note 
that in our case 


fi„ = 6j/ft 



Sec. 43. The Local Limit Theorem 


259 


where 6^ denotes the variance of a separate summand. Putting 
we can write the following obvious equations: 

J {x—aYdF^(x) = 

k=l ” U-al>TBfj 

= J ■ix—aydFi{x) = ^ J {x—aYdFi{x) 

(JT—ol>TSn |*-al>TBn 

From the assumption that the variance is finite and positive we 
conclude that the integral on the right-hand side of this equation 
tends to zero as n tends to infinity. 

Lyapunov’s Theorem. If for a sequence of mutually independent 
random variables ii, la, •.•»!«, • • • it ts possible to choose a posi¬ 
tive number 6 > 0 such that as n —► oo 


^2+6 

n 


a* 


2 + 6 


0 


then as n tends to infinity 


P 


^ fe= 1 / 



iL 
2 dz 


( 10 ) 


uniformly in x. 

Proof. Again, it will suffice to verify that the Lyapunov condi¬ 
tion [condition (10)] implies that the Lindeberg condition holds. 
But this is clear from the following chain of inequalities: 




^ J {x—a^fdF^{x)^ 






1 




^ J |;c—a*P+«£ifs(A:X 


fe=l i*-afel>TBn 


2 

1 k —1 

^^8 B 


2+6 


n 


Sec. 43. The Local Limit Theorem 


We shall now indicate the sufficient conditions for application of 
the other classical limit theorem: the local theorem. In doing so, we 
will confine ourselves to considering only the case of mutually in¬ 
dependent summands having one and the same probability distri¬ 
bution. 



260 


Chap. 8. The Classical Limit Theorem 


Let us agree to say that a discrete random variable |has a lattice 
distribution if there exist numbers a and such that all possible 
values of I may be represented in the form a-\-kh, where the parameter k 
can assume any integral values (— oo<,k<.oo). 

The Poisson, Bernoulli and other distributions are lattice distri¬ 
butions. 

Let us now express the lattice nature of a distribution of a random 
variable I in terms of characteristic functions. For this purpose we 
prove the following lemma. 

Lemma. For a random variable | to have a lattice distribution it is 
necessary and sufficient that for some t^Q the absolute value of its cha¬ 
racteristic function be equal to unity. 

Proof. Indeed, if | is lattice-distributed and is the probability 
of the equation then the characteristic function of the vari¬ 

able ^ is equal to 

CO CD 

f(t)= 2 «+**' = €'■■« 2 

fe=-eo ft= —at) 

From here we find 



2ni-r- 

= e ^ 


We thus see that for every lattice distribution 



Now suppose that for some 




and prove that the variable | then has a lattice distribution. 
The last equation implies that for some 0 


Thus, 


and, consequently, 


f (h )= 


{ e“'’‘dP (x)=e‘* 

e(u.*-e)dF(x)=l 


From this it follows that 


I 


cos (t^x —0) dF {x) — 1 



Sec. 43. The Local Limit Theorem 


261 


For this equation to be possible, it is necessary that the function 
F (a:) be allowed to grow only for those values of x for which 


cos {txX —0)=1 


This means that the possible values of | must be of the form 


X = r--{-k 
‘1 


2jx 

h 


Q.E.D. 

We will call the number h the span of the distribution. The distri¬ 
bution span h is a maximum if, no matter what the choice of b (— oo <c 
<;^<Coo) and it is impossible to represent all possible values 

of I in the form b-\-khi. 

To illustrate the difference between the concepts of distribution 
span and maximal distribution span, we consider the following exam¬ 
ple. Let I assume all odd numbers as its values. Obviously, all values 
of i may be written as a-\-kh, where a=0 and h=l. The span /i cannot, 
however, be maximal, since all possible values of | may also be 
written as b+khi, where b=l and hi=2. 

The conditions for a distribution span to be maximal may be ex¬ 
pressed otherwise. 

Firstly, the span h may be maximal if and only if the greatest com¬ 
mon divisor of paired differences of possible values of the variable § 
divided by h is equal to unity. 

Secondly, the span h is maximal if and only if the absolute 
value of the characteristic function is less than unity in the in¬ 
terval 0<|^|<^ and is equal to unity when I • 

The latter assertion is a straightforward consequence of the 
lemma that has just been proved. Indeed, if for 0 < f, < -r- 


then according to the proof the quantity must be the distri¬ 
bution span, and since 


the span h cannot be maximal. 

From this we can conclude that if /i is a maximal distribution 
span, then for each e>0 there will be a number C(,>0 such 

2ji 

that for all t in the interval —e the following ine¬ 

quality will hold; 

l/{OI<e-^- 


(1) 



262 


Chap. 8. The Classical Limit Theorem 


Now let the random variables |i, ^ 2 , • • •, ■ be mutually 

independent, lattice-distributed and have one and the same distri¬ 
bution function F{x). Consider the sum 

Cn = ^l~r^2+ . . . +ln 

It is obviously also a lattice random variable and its possible values 
can be written as na-\-kh. Denote by W the probability of the 
equation 

tj^=na-\-kh 


in particular, Pi{k)=^P{^i=a-\-kh}=ph. 
Further denote 




an-\-kh — A„ 
~Tn 


where = Bl = Dl„ = tiDl^. 

We can now prove the following proposition which in obvious 
fashion generalizes the local limit theorem of DeMoivre-Laplace. 

Theorem*. Let the independent lattice random variables 

^2* • • • » ••• 

have one and the same distribution function F (x) and let their expecta¬ 
tions and variances be finite. Then for the relation 

to hold and be uniform ink (—oo <ik<ioo) as n tends to infinity^ it is 
necessary and sufficient that the distribution span h be maximal. 

Proof. The necessity of the hypothesis is almost obvious. Indeed, 
if the span h is not maximal, then the possible values of the sum 

n 

S I* will have systematic omissions: the difference between the 

*=i 

closest possible values of the sum cannot be less than d/z, where d 
is the greatest common divisor of the differences of possible values 
of In divided by h. If h is not the maximal span, then d>l for all 
values of n. 

Proof of the sufficiency of the hypothesis requires a somewhat more 
complicated argument. 

The characteristic function of the variable {k=\, 2, 3, ...) is 

00 00 
/(0= 2 pi^M + itkh^glat 2] 

k=-Cf3 k=-CP 


* This theorem was proved by B. V. Gnedenko.— Ed. 



Sec. 43. The Local Limit Theorem 


263 


and the characteristic function of the sum is 

00 

k= — <x 

Multiplying the last equation by and integrating it 

from —to we get 


2ji 

h 


p„(k)= J r‘(t)e-‘' 


- iant-itkh 


dt 


71 

h 


Noting that 

(we will write z in place of 2 „^), we can write 


2x1 

h 


T 


P„(k)= f 


dt 


h 


where 


it A, 


f*(t) = e » f(t) 


Finally, putting x = tB„, we obtain 


nBn 

k 




3tBn 

h 


It is easy to calculate that 


H ^ y 




Let us represent the difference 


dx 


R„ — 2n 


^Pn(k) 


— e ^ 


V 2a 


in the form of a sum of four integrals: 

= *^1 + *^8 + *^8 + *^4 



264 


Chap. 8. The Classical Limit Theorem 


where 



X* 


—e 2 


dx 





■/. = - 

r - izx — 

■ j ^ 

dx 




\x\> A 




= i 

g-izx^*n 

(.£ 

j dx 


eBn<\x 

h 



J.- 

- i 

Q-izxj^n 1 


dx 


/4<1 Jfl 

< zBji 




where i4 > 0 is a sufficiently large constant and s > 0 is a suffi¬ 
ciently small constant, the more precise values of which will be 
chosen later on. 

By virtue of the corollary to the theorem proved in the preced¬ 
ing section, in any finite interval of values of t the relation 


holds uniformly in t. But from this it follows that whatever the 
constant A, 

—9-0 {n —>“ 00 ) 


The integral Vg is estimated by means of the inequality 

1 


I *^2 1^ ^ ^ dx ^xe ^ dx = ~e 

\x\> A A 


Choosing A sufficiently large, we can make as small as we like. 
By the inequality (1) we have 


J |/*( 


sBn j 




X 

K 




Whence it is clear that as n tends to infinity 


Jq-^O 


To estimate the integral we note that the existence of vari¬ 
ance implies the existence of the second derivative of the function 
f* (t). We can therefore, in accordance with (3) of Sec. 35, make 
use of the expansion 




+ 0 (/“) 


/* {i) = 1 


2 



Sec. 43. The Local Limit Theorem 


265 


in the neighbourhood of the point 0; and for if e is 

sufficiently small, we get 


irwK 1 


~ 


< e 


2111 

4 


Then, for 



And so 


na^t^ 


in 

>e 



11 

4 



CO 

^ < 2 J 


A 


e ^ dt 


By choosing A sufficiently large we can make the integral as small 
as we desire. The theorem is proved. 

There is yet another case when it is natural to pose the question 
of the local behaviour of the distribution functions of sums. This 
is the case of continuous distributions. 

The question is: When do the density functions of normalized sums 
converge to the normal density function if the corresponding distri¬ 
bution functions converge to the normal distribution? This problem 
is exhaustively solved in the following theorem. 

Theorem. Let the independent random variables 

.^2> • • • » • • • 

have one and the same distribution function F (x); their expecta¬ 
tions and variances are finite and, beginning with a certain n^, the 
random variable 



has a density function p^ {x . So that as n tends to infinity 


Pn (X) — 



2 


0 


uniformly in x (—oo<Cx<Coo), it is necessary and sufficient that there 
exist a number ni such that the function Pn, (x) is bounded. 

We shall not give the proof of this theorem since it repeats to a great 
extent the reasoning that has just been given and rests on Theorem 3 
of Sec. 36 and the lemma of Sec. 39. 



266 


Chap. 8. The Classical Limit Theorem 


EXERCISES 


1. Prove that as n tends to infinity 



Hint. Apply Lyapunov’s theorem to the chi-square distribution. 

2. The random variables 



—with probability 
+ with probability 


are independent. Prove that for a > —— the Lyapunov theorem may be ap¬ 
plied to them 

3, Prove that as n oo, 


n 



Hint. Apply the Lyapunov theorem to the sum of the Poisson distributed 
random variables with parameter ^=1. 

4. The probability of occurrence of an event A in the fth trial is equal 
to pi; p, is the number of occurrences of A in n independent trials. Prove that 



if and only if 2 

t=i 

5. Prove that under the conditions of the preceding problem, the require- 

OO 

ment that is sufficient not only for the integral theorem but 

t=i 

for the local theorem as well. 



The Theory of Infinitely 
Divisible Distribution Laws 


For a long time the central problem of probability theory was con¬ 
sidered to be the finding of the most general conditions under which 
the distribution functions of sums of independent random variables 
converge to the normal law. Extremely general conditions sufficient 
for this convergence were found by A. M. Lyapunov (see Chap. 8). 
Attempts to expand Lyapunov’s conditions were successful only in 
recent years when conditions were found that are not only sufficient 
but also—under extremely natural restrictions—necessary. 

In parallel with the consummation of the classical range of prob¬ 
lems, there developed a new trend in the theory of limit theorems 
for the sums of independent random variables closely associated with 
the introduction and development of the theory of stochastic (random) 
processes. The first question to arise was: What laws, in addition to 
the normal law, may be limit laws for sums of independent random 
variables? 

It was found that the class of limit laws is not exhausted by far 
by the normal law. Then the question arose of defining the conditions 
that must be imposed on the summands so that the distribution fun¬ 
ctions of the sums converged to one or another limit law. 

In the present chapter our purpose is to describe some investigations 
of recent years devoted to limit theorems for sums of independent 
random variables. Here, we confine ourselves to the case when the 
summands have finite variances. Consideration of the problem without 
this restriction demands more cumbersome calculations; we refer 
the interested reader to its solution in the monograph by Gnedenko 
and Kolmogorov that was mentioned earlier. As a simple consequence 
of the general theorems that we present, we will obtain the earlier 
mentioned necessary and sufficient condition for the convergence of 
the distribution functions of sums to the normal law. 



268 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


Sec. 44. Infinitely Divisible Laws and Their Basic Properties 


A distribution law O (a:) is called infinitely divisible if, no matter 
what natural number n is taken, the random variable distributed in 
accordance with the 0 (x) law is the sum of n independent random 
variables |i, ia, • • •, In with one and the same distribution law 
On{x) (dependent on number of summands n). 

It is clear that this definition is equivalent to the following: the 
law 0(x) is called infinitely divisible if for any n its characteristic 
function is the nth power of some other characteristic function. 

Researches in recent years have shown that infinitely divisible 
laws play a significant role in a variety of problems of probability 
theory. For one thing, it has turned out that the class of limit laws 
for sums of independent random variables coincides with the class 
of infinitely divisible laws. 

We now take up the properties of inifinitely divisible laws that 
will be needed later on. We begin with the proof that the normal and 
the Poisson laws are infinitely divisible. Indeed, the characteristic 
function of the normal law with expectation a and variance is 
equal to 


iai 

(pit) = e 


i 

■2 


For any n, the nth root of cp(/i is again the characteristic function 

d 

of a normal law, but with expectation — and variance — . 

We will generalize somewhat the earlier encountered concept of 
Poisson’s law and we will say that a random variable ^ is Poisson 
distributed if it can assume only values ak~{-b, where a and b are 
real constants, and ^ = 0, 1, 2, ..., and 

P{l = ak + b\ = ‘-^ ( 1 ) 

where A, is a positive constant. It is easy to calculate that the 
characteristic function for the law (1) is given by the formula 

We see that for any m, the nth root of (p(^) is again the characte¬ 
ristic function of Poisson’s law but with different parameters: 

G, — and —b. 

* n n 

Theorem 1. The characteristic function of an infinitely divisible 
law does not vanish. 

Proof. Let O (x) be an infinitely divisible law and (p(0 its chara¬ 
cteristic function. Then, by definition, for any n we have the equation 

<P(0={<Pn(<)}“ (2) 



Sec. 44. Infinitely Divisible Laws and Their Basic Properties 


269 


where q)n(0 is some characteristic function. By virtue of the continui¬ 
ty of the function cp (t) there exists a range of values of the argument 
in which (p(^): 5 ^ 0 ; clearly, in this same region (Pn (0¥=0- 

For n sufficiently large, we can make the quantity |(Pn(OI=v^ i9(0! 
arbitrarily close to unity uniformly in /(|^|^a). 

Now take two mutually independent random variables t]i and qg 
distributed in accordance with some law F(x) and consider their 
difference —t] 2 . The characteristic function of the variable 

T] is 

(0 = [ P = I / (I) p 


We thus see that the square of the absolute value of any charac¬ 
teristic function is a characteristic function. 

Further, since a real characteristic function is of the form 

f(t)=^ cos xl dF (x) 

it therefore follows that we can write the inequality 

1 — f {2t) = 5 (1 — cos 2xt) dF (x) = 

= 2 J siu^xtdF (x)==2 J (1 —cos xt) {I + cosxt)dF (x) ^ 

^4^(1 — cos xt) dF {x) = 4 {I — f (t)) 

From the foregoing we see that the function |q)„(0|^ satisfies 
the inequality 

l-|cp„(20P<4(l-|cp„(0P) 

From this inequality it follows that if n is so large that 
l_|q)^(^)|<8 for then in this region 

l_|cp„(2^)|<l-|cp„(20r<4(l-|cp„(0 PX8(1-1 (p„(0|)<88 

Summarizing, in the region |/|<2a 

J —|9»(0I<88 

Thus, for n sufficiently large in the region of |ft<2a, (p„(0 and 
also (p(0 do not vanish. 

In similar fashion we prove that (p(0^0 in the region |^|<4fl, 
and so on. 

This proves our theorem. 

Theorem 2. The distribution function of a sum of independent ran¬ 
dom variables having infinitely divisible distribution functions is also 
infinitely divisible. 



270 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


Proof. It is obviously sufficient to confine oneself to the case of two 
summands in order to prove the theorem. If (p(/) and are the 
characteristic functions of the summands, then by hypothesis we have, 
for any n, 

<P (t) = {<Pn (t)}”, W = {^n (0)” 


where (p^ (0 and (0 are characteristic functions. Therefore, the cha¬ 
racteristic function of a sum, for any n, satisfies the equation 

X (^)=T W • ^ (0={<Pn (0 • (^)}” 


Theorem 3. The limit distribution function (in the meaning of being 
weakly convergent) of a sequence of infinitely divisible distribution fun¬ 
ctions is itself infinitely divisible. 


Proof. Let the sequence of infinitely divisible distribution 

functions be weakly convergent to the distribution function 0(a:). 
Then 


lim (t) = (p (0 (3) 


uniformly in each finite interval t. By hypothesis, for any n, the 
functions (^~is understood to be its principal value) 

< (t) = (4) 

are characteristic functions. From (3) we conclude that for every u 


limcp?>(0 = (p„(0 (5) 

fe-»-00 


The continuity of cpn (0 follows from the continuity of cp^? (0- By 
virtue of the limit theorem for characteristic functions, cpn (0 ’s a 
characteristic function. From (3), (4) and (5) we find that for every n 
we have the equation 


Q.E.D. 


'P(0 = {<Pn W}" 


Sec. 45. The Canonical Representation of Infinitely 
Divisible Laws 

From now on we confine ourselves to the study of infinitely divi¬ 
sible laws with finite variance. The purpose of this section is to prove 
the following theorem, which was found in 1932 by A. N. Kolmogorov 
and which gives a complete description of the class of distribution 
laws that interests us. 

Theorem. For a distribution function (D (.v) with finite variance to be 
infinitely divisible, it is necessary and sufficient that the logarithm of 



Sec. 45. The Canonical Representation of Infinitely Divisible Laws 


271 


its characteristic function have the form 

log q) (0 = iyt + J — 1 — itx) {x) (1) 

where y is a real constant and G (x) is a nondecreasing function of boun¬ 
ded variation. 

Proof. First assume that 0 (a;) is an infinitely divisible law and 
(p(0 is its characteristic function. Then for any n 

T(0 = {9n {t)Y 

where cpntO is some characteristic function. Since (p(/)^0, this 
equation is equivalent to the following one: 

log(p(0 = nlog(pn(0 =/zlog[l+((pn {t) — l)] 

For any T, as n tends to infinity, 

uniformly in the interval \t\<CT\ for this reason, in any finite in¬ 
terval of values of t the quantity |q)n(0— ^ I be made less than 
any preassigned number so long as n is sufficiently great. We can 
therefore take advantage of the equation 

log [1 + (cp„ (0-1)] = (cp« (0- 1)(1 + 0 (1)) 

which yields 

log cp (0 = lim n (q)„ {t) — 1 ) = lim n J — 1 ) (a:) (2) 

n-*<» «-»oo 

where <I>„(a:) is a distribution function having (p„(0 as its charac¬ 
teristic function. From the definition of expectation and from the 
relation between the functions <l)„(.r) and (bfx) it follows that 

n J xdQ)„ (x) = J x dQ> (x) 

We denote this quantity by y; then Equation (2 may be rewrit¬ 
ten in the following form: 

log cp (t) = iyt -F lim n f {e^^^ — 1 — itx} <icD„ (x) 

n~*-a> 

Now put 

X 

G„{x)==n^ u^dO^iu) 

— 00 

Obviously, the functions G„{x) do not decrease with increasing 
argument and G„(— oo) = 0. Besides, the functions G„(x) are uni¬ 
formly bounded. The last assertion follows from the properties of 
variance and the relations between the functions O (x) and 0„{x). 



272 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


Indeed, 

G„(+oo) = « 5 u^dO„{u) = n^^ u^d(D„{u)— wd(D„(a))"] + 

+ « ad(D„(a)y = a2-f I72 ^3) 


where is the variance of the law (I)(a:). 

In the new notations (see Property 6 of the Stielties integral 
in Sec. 25), 

log cp (0 = iyt + lim r — 1 — itx) ^ dG„ {;f) 

n-*aoJ * 


By Helly’s first theorem, from the sequence of functions G„{x) 
one can choose a subsequence that converges to some limit function 
G{x). If /4 < 0 and B>0 are continuity points of the functions 
G(x), then by virtue of the second theorem of Helly, as k —^-oo, 

B B 

f (ert* _ 1 _ , 7 ^) ^ dG„, (X) — J (e>'* -\-itx)j, dG (x) (4) 

A A 

We know that 

I — 1 — ilx)l^ I — 11 +1 /a: I ^ I I +1 I = 21 /1 • | x | 
and so 


CO 


J 5 _ 1 _ Ifx) ^ dGnu (^) 

— 00 B 




dGn^ (x) 


— 00 B 



A 00 


J+ J dGrik 

—00 B 


c 



max I dGni^ix) 
l< k <00 J 


where r = min(|i4|, B). Since the variations of the functions 0^^ («) 
are uniformly bounded, for any e > 0, we can—by choosing ^4 and 
B sufficiently large — make the following inequality hold: 


00 


j + ^ (e‘*'’ — l—Ux)'^dG„Jx) 

— 00 B 


< 


(5) 


for all t contained in some finite interval, and for all k. 

From (4) and (5) if follows that for any e > 0, for all t con¬ 
tained in an arbitrary finite interval, given sufficiently large n, 
the following inequality holds: 


J —\—itx)^ dGni, (x) — J —l—itx)^ dG (x) 


<e 



Sec. 45. The Canonical Representation of Infinitely Divisible Laws 273 

in other words, 

lim r (e‘*^-l-itx)^,dGn,(x)={(e“^-l-itx)idG(x) 

J * J * 

We have thus proved that the logarithm of the characteristic func¬ 
tion of any infinitely divisible law may be written in the form 
of (1). Now we have to prove the converse proposition, that any 
function whose logarithm is expressible by formula (1) is the char¬ 
acteristic function of some infinitely divisible law. 

For any E(0<e< 1) the integral 

8 

J (e'«—1 — itx) dG {x) 

8 

by definition of the Stieltjes integral, is the limit of the sums 

n 

2 1 —‘tx,) =4- (G G(x,)) 

S = 1 


where Xi = 8, and max{;f,+i—x,)—►O. 

& 

Each term of this sum is the logarithm of the characteristic func¬ 
tion of some Poisson law. According to Theorems 2 and 3 of Sec. 
44, the integral (6) is the logarithm of the characteristic function 
of some infinitely divisible law. Passing to the limit ase—►O, we 
convince ourselves that we have the very same thing for the integral 

j (e-t^-l-itxy^dGix) (7) 

X>0 

In similar fashion we prove that the integral 

^ (e‘<’‘-l-itx)]^dG{x) (8) 

X<0 

is the logarithm of the characteristic function of some infinitely 
divisible law. The integral on the right-hand side of formula (1) 
is equal to the sum of the integrals (7) and (8) and the quantity 

iV<-i-<>(G( + 0)-G(-0)) 

This last term is the logarithm of the characteristic function of 
the normal law. From Theorem 2, Sec. 44, it follows that the 
function 9 (i), expressible by means of (1), is the characteristic 



274 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


function of some infinitely divisible law.* It now remains for us 
to convince ourselves that the representation of log(p(^) by (1) is 
unique, i.e., that the function G(a:) and the constant y are uni¬ 
quely determined by specification of (p(/). 

By differentiating formula (1) we find 

^log<p(0 = —(9) 

From the theory of characteristic functions we know that the 

function G{x) in this formula is uniquely determined by ^log(p(0- 

While proving the theorem we saw that the constant y is the 
expectation and, hence, is also uniquely determined by the func¬ 
tion q)(t). 

Finally, we note the probabilistic meaning of the total variation 
of the function G(x). We know that if a random variable | is 
distributed according to the law 0 (a:), then (see (5) of Sec. 35) 

D|- 

From (9) it therefore follows that 

D|=f (iG(A:) = G( + oo) 

The canonical representation ot the normal law and the Poisson 
law may serve as an illustration. 

For the normal law with variance and expectation a, 

Indeed, this function and the constant y lead to this law, since 

J jert*_ 1 _ iix\ 3 dG(*) = Hm [0(+0)-G(-0)] - ^ 

and by virtue of the uniqueness of the canonical representation, 
the other functions G{x) cannot yield the normal law. 

In similar manner it is easy to see that to the Poisson law with 
characteristic function 




* We have just proved that any infinitely divisible law is either a convolu¬ 
tion of a finite number of Poisson laws and the normal law or the limit of 
a uniformly converging sequence of such laws. We thus see that the normal and 
Poisson laws are the basic elements that comprise every infinitely divisible law. 



Sec. 46. A Limit Theorem for Infinitely Divisible Laws 


275 


there corresponds the function G{x) with a single jump at the 
point a: 

and y= b + aX. 

Sec. 46. A Limit Theorem for Infinitely Divisible Laws 

We know that if a sequence of infinitely divisible distribution 
laws converges to a limit distribution law, then this limit law is 
itself infinitely divisible. We now point out the conditions that 
suffice for a given sequence of infinitely divisible distribution func¬ 
tions to converge to the limit distribution function. 

Theorem. In order for a sequence {<D,i(;c)} of infinitely divisible dis¬ 
tribution functions to converge^ as n-^oo, to some distribution function 
(D (a:) and for their variances to converge to the variance of the limit laWy 
it is necessary and sufficient that there exist a constant y and the function 
G{x)y for whichy as n^oo, 

(1) Gn{x) converges weakly to G(x), 

(2) G„(oo) —G„(—oo)->G(oo)—G(—cx>), 

(3) Yn-^Y. 

where Yn CL^d Gn (a:) are defined by formula (1), Sec. 45, for the to <D„ (a:), 
and the constant y and the function G (x) define, by the same formula, 
the limit law 0(x). 

Proof. The sufficiency of the conditions of the theorem is a di¬ 
rect consequence of Helly’s second theorem. Indeed, from the con¬ 
ditions of the theorem and from formula (1), Sec. 45, it follows that 
as n-^oo 

log(p„ (/)->logcpG) 

uniformly in every finite interval t. 

In the preceding section we saw that the integrals 

^dG^{u) and ^dG (u) 

were equal to the variances of the laws (x) and O (x); therefore, 
the second condition of the theorem is nothing other than the re¬ 
quirement of convergence of the variances. 

Suppose we now know that as n tends to infinity 

(1>„ (X)0 (X) (1) 

and the variances of the laws 0„ (x) converge to the variance of the 
limit law0(x). We shall prove that these requirements imply ful¬ 
fillment of the conditions of the theorem. As we have already noticed, 



276 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


this does not require any supplementary argument with regard to 
condition 2. From this it follows that the total variations of the fun¬ 
ctions Gjiiu) are uniformly bounded. We can therefore take advantage 
of Helly’s first theorem and from the sequence of functions Gniu) 
we can choose a subsequence Gn^iu) that converges to some limit 
function G^ (u) as ^->oo. Our purpose consists in proving the equation 

G* (u) = G {u) 

To do this, we first establish that 
Jk = ^ l — itu} ~ dGn^ (u) 

~dGc^{u) (2) 

as k tends to infinity. Let A < 0 and > 0 be continuity points 
of the functions G® (u)\ then by Helly’s second theorem, as k —^oo, 

B B 

J I -~itu} dGnk («) — l ~ itu} —■ dG^ (u) (3) 

A A 

On the other hand, from the inequality 

— f7A:|^2|fA:| 

we see that 

A CO 

1 —(w) < 

— 00 B 

<2U| J +J ‘ 

— 03 B \ —ooi? ^ 

where r = min(—/t, B). By virtue of the uniform boundedness of 
the variations of the furctions G„(u), for any e>0 it is possible 
to select A and B so lajge in absolute value that 

4 < 8 (4) 

Similarly, for any 8>0, the inequality 

A CO 

J -f 1 —iTw}-^dG®(w) <8 (5) 

-00 B 

holds for A and B sufficiently large in absolute value. From the 
relations (3), (4) and (5) we conclude that no matter what e > 0, 
for sufficiently large values of k 

1/*-/®|<38 



Sec. 46. A Limit Theorem for Infinitely Divisible Laws 


277 


Relation (2) is thus proved. From (1) we see that 
lim log (()„ (t) = lim (iyj + f {e""— 1 —itu) dG„ («)) = 

= log 9 (t) = iyt + J — 1 — itu} ~ dG («), 
or 

„'i“ (‘’^'“ + 1 1—I’M (u)] = 

= 17+1—(^m} u) (6) 

From the inequality 

|e““— 1 —««|<^ 

and the uniform boundedness of the total variations of the functions 
Gfi^ (u) we conclude that as t —^0 

j J dGn^ (w) <| ^ J(«) 0 

uniformly in n. And so, as I —> 0, formula (6) yields 

lini7„t = 7 (7) 

ft -* oe 

and, on the other hand, from (2) and (7), 

log (p {t) = iyt + J 1 —iut) dG^ {u) 

By virtue of the uniqueness of the representation of infinitely divi¬ 
sible laws by formula (1), Sec. 45, we conclude that G^{u)—G{u). 

To summarize, then, any convergent sequence of functions 
converges to the function G(«), and at the same time the constants 
converge to y. 

It is now easy to prove that the entire sequence Gn (u) also converges 
toG(«) and, hence, at the same time lim y„=y, for otherwise there 

n-t-w 

would be a point of continuity of the functions G (u), call it c, and a 
subsequence of the functions Gnkid), which at the point w=c conver¬ 
ges to a number different from G{c) as k->oo. By Helly’s first theorem 
we can extract from this sequence a convergent subsequence Gnk^{u). 

From the foregoing it follows that at all points of continuity of 
the function G{u) 

G«4,(“) = G(«) 

r ->■ sx> 

This contradicts our assumption. Thus, at ail points of continuity 
of the function G(«) 

lim G„(u) — G{u) 



278 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


From this, as we have seen, there follows immediately 

lim 7„ = v 

« -»• OD 

The theorem is proved. 

Sec. 47. Statement of the Problem of Limit Theorems for Sums 
Given the double sequence 

^ll> ^12» • • • ) ^1/!, 

^21» ^22» • • • * -2A2 

/2l> ^W2> • • • > ^nkn 



of independent random variables in each row. We want to know 
to what limit distribution functions the distribution functions of 
the sums 

Cn E/il "f" 1/J2 “F • • • ~F E/ifen 

can converge as n tends to infinity and what the conditions of this 
convergence are. 

Henceforth we shall confine ourselves to the study of elementary 
systems, that is, double sequences (1) which satisfy the following 
conditions: 

(1) the variables have finite variances, 

(2) the variances of the sums are bounded by a constant C not 
dependent on n, 

(3) max »-0 as n—^ 00 . 

l < /e < fen 

The last requirement means that the effect of the separate terms 
on the sum becomes less and less as n increases. 

The limit theorems for sums that we considered earlier quite 
obviously fit into this general scheme. For instance, in the theorems 
of DeMoivre-Laplace and Lyapunov we had the following double 
sequence: 

’ »« 2 > • • • ) T>/ 2 « 

wliere 

«= 1,2 ,...) 

ioi, 

fe=] 

In the theorems of Bernoulli, Chebyshev and Markov concerning 
the law of large numbers we likewise had to do with double se¬ 
quences in which the quantities 

t _Ife ^Ife 

S«fe 



nk' 


are taken for 


n 





Sec. 48. Limit Theorems for Sums 


279 


Sec. 48. Limit Theorems for Sums 


Let there be an elementary system. Denote by the distri¬ 

bution function of the random variable and by F„,^{x) the dis¬ 
tribution function of the variable = is obvious 

that 


ix) = Fnh 


Theorem 1. /n order for the distribution functions of the sums 

Ll +^2+ • • • 0) 

to converge to a limit distribution function as n oo, it is neces¬ 
sary and sufficient for infinitely divisible lawSy the logarithms of 
the characteristic functions of which are given by the formula 

S (2) 

fe=l ^ ' 

to converge to a limit law. 

The limit laws for both sequences coincide. 

Proof. The characteristic function of the sum (1) is 


/„(o=n /„»(«=«n (3) 

fe=j <f=i 

where /„*(/) is the characteristic function of the random variable 

and JnkW is the characteristic function of the variable 

We know that for convergence of the distribution functions of 
the sums (1) to the limit distribution function O (x), it is neces¬ 
sary and sufficient that, as n —► oo, 


* If we introduce the notations 

yn~ 2 ^« (^) “ 2 1 

fe=l «? = ! -00 

and note that J v (x) = 0, then the iunctions il’„ (0 may be written as fol¬ 
lows: 

tn (0 = 1 — Hu} dGn (w) 

As we know, this means that is the logarithm of the characteristic function 

of some infinitely divisible law. 

It will be noted that the variances of and of the infinitely divisible laws (2) 
coincide. 



280 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


where cp (/) is a continuous function; then (p {t) is the characteristic 
function of the law O (x). 

Let 


^nk fnk (0 ^ 

For the variables 

a„= max —0 (4) 

1 /s kfi 

uniformly in every finite interval of t. Indeed, 

J -1) W = J (e"* -1 - itx) dF„, (x) 

since 

<^lrk = ^xdF„^ (x) = 0 

We know that for all real a 

Therefore, 

I I < -y J (x) = Y D|„J (5) 


From (5) and the third condition of elementariness of a system 
there follows (4). 

From (4) we first of all conclude that for any T we can assume 
that for sufficiently large n and |^|^r 

I“n6l<4- (6) 


By virtue of this fact we can make use of the series expansion of 
tlie logarithm 

Jog /«* (0 = Jog ( J + <^nk) = =^nk + ^nk 

Obviously, 


Rn== 


log L (0 — 2 + a„j) 

«= 1 


X (Iog7n*(0—«»*) 


n=il 


kn « 


k=ln=2 


1 an* I' 


kn 

1 V 


^ k^\ '-l“«*l 


( 7 ) 


Formula (5) leads to the inequality 

kn 

/?„< max |a„*|y|a„j|<^C max |o„j| 

ft—I * 



Sec. 48. Limit Theorems for Sums 


281 


From (4) we conclude that 

( 8 ) 

uniformly in every finite interval of / as n tends to infinity. 

We have thus established that in every elementary system the distri¬ 
bution functions of the sums and the infinitely divisible distribution 
functions defined by formula (2) are asymptotic as n tends to infinity, 
and so Theorem 6 is proved. 

This theorem makes it possible to replace the investigation of sums 

(1) of random variables having, generally speaking, arbitrary distri¬ 
bution functions by the study of infinitely divisible laws, which, as 
we shall see, is in many cases extremely simple. 

Theorem 2. Every distribution low that is a limit law for the dis¬ 
tribution functions of sums in an elementary system is infinitely divisible 
with finite variance and, conversely, every infinitely divisible law with 
finite variance is a limit law for the distribution functions of the sums 
of some elementary system. 

Proof. From Theorem 1 we know that the limit law for the distri¬ 
bution functions of sums (1) is a limit law for infinitely divisible laws 
and, consequently, by Theorem 3, Sec. 44, it is infinitely divisible; 
its variance is finite since the variances of sums are uniformly boun¬ 
ded by the second condition of elementariness of a system. The con¬ 
verse proposition that every infinitely divisible law with finite va¬ 
riance is a limit law for sums follows directly from the definition of 
infinitely divisible laws. 

Theorem 3. In order that the distribution functions of the sums (1) 
converge, as n-^oo, to some limit distribution function and their varian¬ 
ces converge to the variance of the limit law, it is necessary and suffi¬ 
cient that there exist a function G (u) and a constant y such that as n tends 
to infinity 

(1) S I x^dF„Ax)^0(u) 

at the continuity points of the functions G [u), 

(2) 2 ’■G(+o°) 

(3) 2 lxdF„Ax)-^y 
&=1 " 



282 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


The logarithm of the characteristic function of the limit law is 
given by formula (I) of Sec. 45 with the function G (u) and the con¬ 
stant y that have just been defined. 

Proof. If we introduce the notations 

kn “ 

G„ (u) = 2 S (X) 

«=i 

and 

v„ = 2 J ■>= (X) 

k=J 

we arrive at the conditions of the theorem of Sec. 46. That proves 
the theorem. 

By slightly modifying the formulation of Theorem 3 we can 
obtain not only the conditions for the existence of the limit law, 
but also the conditions for convergence to every given limit law. 

Theorem 4. /n order that the distribution functions of the sums (1) 
converge, as n tends to infinity, to a given distribution function O (a:) 
and the variances of the sums converge to the variance of the limit 
law, it is necessary and sufficient that as n tends to infinity the 
following conditions be satisfied: 

(1) 2 5 x^dF„„(x)-*G(u) 

*=i Jc 

at the continuity points of the function G {u) 

(2) ^ f —G(oo) 

fe=i 

(3) 2 '\xdF„i,{x)^y 

A=1 

where the function G (a) and the constant y are given by formula { 1) 
of Sec. 45 for the function 0(a:). 

Sec. 49. Conditions for Convergence to the Normal 
and Poisson Laws 

We shall apply the results of Sec. 48 in order to derive the 
conditions for the convergence of the distribution functions of sums 
to the normal and Poisson laws. 



Sec. 49. Conditions for Convergence to the Normal and Poisson Laws 


283 


Theorem 1. Given an elementary system of independent random 
variables. For the distribution functions of the sums 

Cn = + • • • + 1) 

to converge, as n tends to infinity, to the low 


X 



il 

2 dx 


it is necessary and sufficient that, as n tends to infinity, the fol 
lowing conditions be satisfied: 

( 1 ) 2 [xdF„^(x)^0 

(2) 2 J x^dF„^{x)-^0 

^ I je| > t 

(3) 2 S x^dF„,(.x)^l 

k— 1 I I ^ r 


where t is any positive constant. 


Proof. From Theorem 4, Sec. 48, it follows that the desired 
conditions consist in satisfaction of the following relations, as n 
tends to infinity, 

2 l>‘dF„„(x)—.0 

fe =1 


2 S ^‘‘dfnkiX) 

k=l 


0 for M < 0 
1 for M > 0 


2 

!?=-1 


dF„. (x) 


1 


The first one coincides with the first condition of the theorem, 
and it is obvious that the two others are equivalent to the second 
and third conditions of the theorem. 

This theorem takes on an especially simple form if the elementary 
system under consideration is normalized beforehand by the con¬ 
ditions 

2 S JC" dF„i, (x) = 1 

*=1 

^ xdF^^f^ (x) = 0 (1 n=\,2, ...) 


( 2 ) 



284 


Chap. 9. The Theory of Infinitely Divisible Distribution Laws 


Theorem 2. If an elementary system is normalized by the rela¬ 
tions (2), then for convergence of the distribution functions of the 
sums (I) to the normal law it is necessary and sufficient that for 
all T > 0, as n tends to infinity. 




2 

b= 1 


5 JC” dF„„ {X) 

\X\>T 


0 


(3) 


The proof of the theorem is obvious. 

The requirement (3) bears the name of Lindeberg's condition be¬ 
cause Lindeberg, in 1923, proved its sufficiency for convergence of 
the distribution functions of sums to the normal law. In 1935 
W. Feller proved the necessity of this condition. 

To illustrate the use of the general theorems of the preceding 
section we consider the convergence of the distribution functions 
of elementary systems to the Poisson law: 




0 


z 

< Ar < 



for 

for jc > 0 


(4) 


If I is a random variable distributed in accordance with the 
law (4), then, as we know, M| = D| = X. 

We confine ourselves to elementary systems for which 

I M|„, 

jk=i 

2Di„s 

kssl 

Theorem 3. there be given an elementary system that obeys 
the conditions (5). The distribution functions of the sums 



tn — Inl + ina + 




n 


converge to the law (4) if and only if for any t>0 

2 I (rt—.00) 

*==1 ljr-11>T 


We leave the proof of this theorem to the reader. 

In Sec. 15 we proved the Poisson theorem. It will readily be seen 
that when npn=K it is a special case of the proposition that has just 
been proved. Indeed, let ink (l^k^n) be a random variable that 



Exercises 


285 


takes on values 0 or I depending on the occurrence or nonoccurrence, 
in the ^th trial of the nth series of trials, of event A that we are obser¬ 
ving. Here 

P{i,*=n = ^ and P{|„» = 0} = 1-A 
Obviously, the sum 

• • • “I'lnn 


is the number of occurrences of the event A in the nth series of trials. 

According to the Poisson theorem, the distribution functions of 
the variables jXn reduce to the Poisson law (5) as n-^oo. This result 
also follows from the theorem that has just been formulated, since 
in the given case all its requirements are satisfied. 

The general theorems concerning the approach of the distribution 
functions of the sums (1) to some infinitely divisible distribution 
functions, proved under broader assumptions than ours, also permit 
obtaining the necessary and sufficient condition for the law of large 
numbers (in the case of independent summands). See the earlier men¬ 
tioned monograph of B. V. Gnedenko and A. N. Kolmogorov. 


EXERCISES 


1. Prove that the distributions of 

(a) Pascal (Exercise 1 (a) of Chapter 5), 

(b) Polya (Exercise 1 (b) of Chapter 5), 

(c) Cauchy (Example 5, Sec. 24) 
are infinitely divisible. 

2. Prove that a random variable with density function 

0 for 

.; for x>0 

r(a) 

where a > 0, P > 0 are constants, is infinitely divisible. 

Noie. From this it follows, in particular, that the Maxwell distribution and 
the chi-square distribution are infinitely divisible fOT any value of n. 

3. Prove that, no matter what the constants a > 0 and P > 0, 

q)(0=(l + p) 

is an infinitely divisible characteristic function. 

Noie. From this it follows, in particular, that the Laplace distribution (Exer¬ 
cise 6 of Chapter 5) is infinitely divisible. 

4. Find the function G (x) and the parameter y in Kolmogorov’s formula for 
the logarithm of an infinitely divisible characteristic function for: 

(a) the distribution in Example 2, 

(b) the Laplace distribution. 




286 Chap. 9. The Theory of Infinitely Divisible Distribution Laws 

5. Prove that if the sum of two independent infinitely divisible random varia¬ 
bles is distributed according to 

(a) the Poisson law, 

(b) the normal law, 

then every summand is Poisson distributed in case (a) and normally distributed 
in case (b). 

6. Find the conditions under which the distribution functions of sums of 
random variables constituting an elementary system converge to; 

(a) the distribution of Example 2, 

(b) the Laplace distribution. 



CHAPTER 


10 

The Theory of Stochastic 
Processes 


Sec. 50. Introductory Remarks 

Refinements in the statistics of physics and in a number of bran¬ 
ches of technology have confronted probability theory with a large 
number of new problems that do not fit into the framework of the clas¬ 
sical theory. Whereas physics and technology were interested in stu¬ 
dying processes, that is, phenomena that take place in time, the theory 
of probability did not have either general procedures or elaborated 
particular schemes for solving problems that arise in the study of such 
phenomena. It was insistently necessary to develop a general theory 
of random processes, a theory which would study random variables 
dependent on one or several continuously varying parameters. 

Let us examine a number of problems that will illustrate the neces¬ 
sity of constructing a theory of random processes. 

Suppose our purpose is to trace the movement of some molecule 
of a gas or liquid. At random instants the molecule collides with 
other molecules and thus alters its velocity and position. The state 
of the molecule is thus subjected to random changes at every instant 
of time. Many physical phenomena require the ability to compute 
the probabilities that a definite number of molecules will be able, 
within a given time interval, to cover certain distances. For example, 
if two gases or two liquids are brought into contact, a mutual penet¬ 
ration of the molecules of one into those of the other sets in: diffusion 
occurs. How fast does the diffusion process develop, according to 
what kind of laws, and when does the resulting mixture become pra¬ 
ctically homogeneous? All these and many other questions are ans¬ 
wered by the statistical theory of diffusion, which is based on the theory 
of random (stochastic or probabilistic) processes. It is obvious that a 
similar problem arises in chemistry when studying the process of 
a chemical reaction. What portion of the molecules has already ente- 



288 


Chap. to. The Theory of Stochastic Processes 


red into the reaction, how is the reaction proceeding in time, and when, 
for all practical purposes, will it come to an end? 

An extremely important range of phenomena take place in accor¬ 
dance with the principle of radioactive disintegration. This consists 
in the atoms of a radioactive substance decaying into the atoms of 
another element. The decay of every atom occurs instantaneously, 
like an explosion, with release of a certain amount of energy. Nu¬ 
merous observations have shown that the decay of different atoms ta¬ 
kes place at randomly chosen instants of time, as far as the observer 
is concerned. And the times of occurrence are independent of one ano¬ 
ther in the meaning of probability theory. For studies of the process 
of radioactive decay it is essential to determine the probability that 
within a certain time interval a certain quantity of atoms will disin¬ 
tegrate. Formally speaking, other phenomena proceed in exactly the 
same fashion if we confine ourselves to elucidating the mathematical 
picture of the phenomenon. Such are the number of calls at a telephone 
exchange during a given time interval (the load or traffic at the tele¬ 
phone exchange^the breakage of thread on a ring spinning frame (type of 
spinning loom) or changes in the number of particles that are in Brow¬ 
nian motion and that are located in a given region of space at a given 
instant of time. In this chapter we give a simple solution to the 
mathematical problems that such phenomena lead to. 

We have considered some simple problems of a practical nature 
concerned with concrete stochastic processes (see Chapter 1, Sec. 10, 
Examples 2, 3, 4 and Exercises 21 and 22 of Chapter 1, and 15 and 
16 of Chapter 2). 

To what has been said let us add that in the introduction it was 
stated that the first examples of probabilistic processes were consi¬ 
dered at the beginning of this century by a number of outstanding 
physicists. We shall now give a brief description of how by proceeding 
from the extremely schematic random walk problem, they were able 
to obtain the differential equation of diffusion theory. The line of 
reasoning is as follows: let a particle be subject, at times kT{k=\,2, 
...), to independent random impacts as a result of which it is displa¬ 
ced each time to the right by an amount h with probability p and to 
the left by an amount/i with probability q=\ — p. Denote by f{x, t) 
the probability that the moving particle, after starting from the point 
x~0 at time /=0, wuTl, as a result of n impacts, reach the position % 
at time t {t^ni). It is clear that in the case of an even number of im¬ 
pacts the quantity x will only equal an even number of steps h and, 
for an odd number n, the number of steps h will be odd. If by m we 
denote the number of steps taken by the particle to the right (n — 
— m, respectively, the number of steps to the left), then, according to 
the Bernoulli formula. 


fix, 



Sec. 50. Introductory Remarks 


289 


It is clear that the quantities m, n, t are connected by the 
equation 


m — {n—m) = ~ 


Direct calculation immediately shows that f{x,t) satisfies the 
following difference equation: 

fix, t= pf {x—h, t)-^qf{x-i-h, t) (1) 


and the initial conditions 

/(0, 0)=1, /(x, 0) = 0 for x= 7^=0 


Let us see how the difference equation changes if we let both 
h and t tend to zero. The physical nature of the problem will 
compel us to impose certain restrictions on h and t. By the same 
reasoning, the quantities p and p cannot be taken arbitrarily. Non- 
observance of certain conditions discussed below can result in the 
particle, within a finite interval of time, going off to infinity with 
probability one. To escape this possibility, we impose the following 
requirements: when n tends to infinity, 

x = nh, t = m, Y —2D, 2^ — ^ (2) 

where c and D are certain constants, the former called the drifty 
XMer, diffusion coefficient. 

Subtracting f ix, t) from both sides of (1), we get 

fix, t + T)—fix, t) = plfix—h, t)—fix, /)] + 

q [f ix + h, t) —f ix, 0] (3) 

Suppose that fix, t) is differentiable with respect to t and twice 
differentiable with respect to x. Then 

f{x-h, 

f{x + h. + + 


Substituting these equations into (3) we get 


_ (x, t) 

^—df- 


+ o(t) = —(p— 


df (X, t) 

dx 


d^f ix, t) 
2 dx^ 




And from this, by virtue of the relations (2), we find that in the 
limit 


^/(^. c^^dfix, t) , j^d^fix, t) 

dt dx 



290 


Chap. 10. The Theory of Stochastic Processes 


We have arrived at an equation that in diffusion theory is called 
the Fokker-Planck equation. 

It is interesting to note that this rather artificial statement of the 
problem has yielded a physically meaningful result which reflects 
very well the true picture of diffusion. Later on we will derive the 
general equations obeyed by distributions for stochastic processes 
under extremely broad assumptions concerning the nature of their 
development. 

The general theory of stochastic processes originated in the funda¬ 
mental works of the Soviet mathematicians A. N. Kolmogorov and 
A. Ya. Khinchin at the beginning of the 1930s. Kolmogorov, in a paper 
entitled “On Analytical Methods in Probability Theory” , gave a sys¬ 
tematic and rigorous construction of the fundamentals of the theory 
of stochastic processes without aftereffect or, as it is customary to say, 
Markov processes. In a number of works, Khinchin created the prin¬ 
ciples of the theory of so-called stationary processes. 

Before subjecting natural or technical processes to a mathematical 
study, they first have to be made schematic. The reason for this lies 
in the fact that mathematical analysis is applicable to the investi¬ 
gation of a process of variation of some system only if it is assumed 
that every possible state of the system is fully defined by means of 
some definite mathematical apparatus. Quite naturally, such a math¬ 
ematically defined system is not actuality itself, but only a scheme 
suitable for its description. That, for instance, is what we encounter 
in mechanics when it is assumed that the real motions of a system of 
mass points can be completely described for any instant of time by 
indicating the instant and its state at any preceding time 4- In other 
words, the scheme used in theoretical mechanics for describing mo¬ 
tion consists in the following: it is taken that for any time t the state 
of a system ^ is fully determined by its state x at any preceding time 
tfs. Here, the state of the system in mechanics is understood to be the 
specification of positions and velocities of the points of the material 
system. 

Outside classical mechanics, actually throughout the whole of mo¬ 
dern physics, one has to do with a far more complicated situation when 
a knowledge of the state of a system at some time /© no longer uniquely 
determines the state of the system at subsequent times but only de¬ 
termines the probability that the system will be in one of the states of 
a certain set of states of the system. If by x we denote the state of 
the system at time U and by £ a certain collection of states of the sys¬ 
tem, then for the processes just described there is defined the probabi¬ 
lity 

P{/o, x\ L E) 

that the system which at time U is in the state x will at time t pass 
into one of the states of the set 



Sec. 51. The Poisson Process 


291 


If any additional knowledge of the states of the system at times 
t<.U does not alter the probability, then it is natural to call this 
class of stochastic processes that we have isolated processes without 
aftereffecty or, by analogy with Markov chains, Markov processes. 

The general concept of a stochastic process that is based on the ear¬ 
lier presented axiomatics may be introduced as follows: Let U be 
a set of elementary events and t a continuous parameter. A stochas¬ 
tic process is defined as the function of two arguments: 

|(0=cp(e, 0 (e^U) 

For every value of the parameter /, the function q) (g, t) is a function 
of e only and, consequently, is a random variable. For every fixed 
value of the argument e (that is, for every elementary event) (p (e, t) 
depends only on t and is thus simply a function of one real argument. 
Every such function is called a realization of the stochastic process 
^{t). We may regard a stochastic process either as a collection of ran¬ 
dom variables 1(0 that depend on the parameter t, or as a collection 
of the realizations of the process |(0. Naturally, to define a process 
it is necessary to specify a probability measure in the function space 
of its realizations. 

The present chapter will be devoted, in its entirety, to a study of 
processes without aftereffect and of stationary processes. 

Sec. 51. The Poisson Process 

Before beginning an exposition of some of the general results that 
have now become classical, we will make a detailed study of one 
example of a stochastic process without aftereffect that plays an im¬ 
portant role both in theory and in a diversity of applications. 

Suppose a certain event occurs at random times. We are interested 
in the number of occurrences of the event during the time interval 
from 0 to t. Denote that number by |(^). Relative to the process of 
occurrence of the event we will presume that it is (1) stationary; (2) 
without aftereffect, and (3) ordinary. The following meaning is at¬ 
tached to these assumptions. 

Stationarity signifies that for any group of a finite number of 
nonoverlapping time intervals the probability of occurrence of a defi¬ 
nite number of events during the course of each one of them depends 
on the numbers only and on the duration of the time intervals, but 
is not changed by an identical shift in all the time intervals. In par¬ 
ticular, the probability of occurrence of k demands (events) during 
the time interval from T to T-\-t is independent of T and is a function 
only of k and t. 

The absence of aftereffect means that the probability of occurrence 
of k events during the time interval (T, T-^t) does not depend on 
how many times the events occurred previously or how they occurred. 



292 


Chap. 10. The Theory of Stochastic Processes 


This assumption means that the conditional probability of occurrence 
of k events during the time interval (7, T-\-t) under any assumption 
of the occurrence of events prior to time T coincides with the uncon¬ 
ditional probability. In particular the absence of aftereffect signifies 
mutual independence of the occurrence of any number of events during 
nonoverlapping intervals of time. 

Ordinariness expresses the requirement of practical impossibility 
of the occurrence of two or several events during a small time interval 
tit. Denote by P>i(A0 the probability of occurrence of more than 
one event in the time interval A^. Then the condition of ordinariness, 
precisely expressed, consists in the following: 

P>i(A0=o(A0 

Our immediate problem will then be to determine the probability 
Pk (t) that during an interval of duration t there will occur k events. 
By the assumptions made, these probabilities do not depend on the 
location of the time interval. With this purpose in mind, we find that 
for small A7the following equation holds: 

where X is a constant. 

Indeed, consider a time interval of duration 1 and denote by p 
the probability that no event will occur within this period. Parti¬ 
tion the time interval into n nonoverlapping equal parts. By vir¬ 
tue of stationarity and the absence of aftereffect we have 


and so 


P = 



n 





1 


= P 


*n 


From this, for any integral k 



Now let t be some nonnegative number. For any n it is possible 
to find a k such that 


k—\ 

n 



Since the probability Pq (t) is a decreasing function of time, 



Sec. 51. The Poisson Process 


293 


Thus, Po(t) satisfies the inequalities 

fe-i ^ 

P" >PAt)>P^ 

Now let k and n tend to infinity so that 

lim ~ = ^ 

ao" 

From the foregoing it is clear that 

Po{t)=p^ 

Since Po(0* being a probability, satisfies the inequalities 

0<Po(0<l 

three cases are possible: (1) p=0; (2) p=I; (3) 0<Cp<l. The first 
two cases are of little interest. In the first we have the equation Pq (^)=0 
for any t and, hence, the probability is one that at least one event 
will occur during a time interval of any length. In other words, in¬ 
finitely many events will occur with probability 1 during a time in¬ 
terval of arbitrary duration. In the second case, and, con¬ 

sequently, events do not occur. Only the third case is of interest; 
here put where X is some positive number (^=—In p). 

Summarizing, then, from the assumptions of stationarity and ab¬ 
sence of aftereffect, we have found that for any t^O (we have not yet 
made use of the assumption of ordinariness) 

Poit)=e-^^ ( 1 ) 

Clearly the equation 

Po{t)-\-Piit)+P:>r{t)=\ 

holds for any value of t. 

It follows from the foregoing that for small t 

Po(0~f— Xt-\-o{f) 

Hence for small t 

Pi{t)==Xt+o{t) (2) 

Now we can derive the formulas for the probabilities Phit) for 
k'^l. For this purpose, we determine the probability that during 
time /+A/ an event will occur exactly k times. This may occur in 
k+\ different ways, namely: 

(1) during time interval t all k events will occur, and none will 
occur during time At; 

(2) during an interval of length t there will occur k —1 events, 
and during time At, only one; ... 

(ife+1) during the time interval t the event will not occur once, 
but during AMt will occur k times. 



294 


Chap. 10. The Theory of Stochastic Processes 
By the formula of total probability, 

/=0 

(here both the condition of stationarity and absence of aftereffect 
have been taken into account). Put 

/=0 

It is obvious that 

<2 Pk-i m=kp. m < 2 Ps m =p>, m=o{\t) 

/=0 s=s2 s=2 

according to the condition of ordinariness. 

Thus,. 

P, (t + ^t) = (t) Po (AO + P,., (0 p. (AO + 0 (AO 

But from what has been proved, 

Po (AO = = 1 —X,A<+ 0 (A/) 

Furthermore, by (2), 

P,(A0 = XA<+o(A0 

and so 

P* (t + AO = (1 -UO P* (0 + ^A<Pj_. (0+0 (AO 
From this we have 

P,(i+A0-P,(t) ^ (0 +A,P*.. (0+0(1) 

Since as A^—^-0 the limit of the right-hand side of the equation 
exists, the left-hand side also has a limit. As a result we get the 
equation 

^=-XP»(0 + XP*..(0 (3) 

for the determination of Obviously the requirement of ordina¬ 

riness and the expression for Po{t) that we found lead to the following 
initial conditions: 

Po(0)=l; Pk(0)=0 for k^l (4) 

It is easy to solve Equations (3) by making the substitution 

Ph{t)=e-'^^Vk(f) (5) 

where Vkif) is the new desired function. Note that, by (1), yo(0=l- 
The relations (4) lead to the following initial conditions: 

Uo(0)=l and t;ft(0)=0 for k^\ 


( 6 ) 



Sec. 51. The Poisson Process 


295 


Substitution of (5) into (3) gives us 


In particular, 


dvk (0 
dt 




(0 


dvi (t) 
dt 




(7) 


{7') 


Solution of Equations (7') and (7) in succession brings us (taking 
into account the initial conditions) to the equation 





(0 = 


3! 


and, generally. 



(W 

ki 


We thus get 


Pkit) 


k\ ^ 


( 8 ) 


for any k'^0. Our problem is solved. 

The requirements satisfied by the process of occurrence of the event 
are fulfilled to a high degree of accuracy in numerous natural pheno¬ 
mena and technological processes. We have instances such as the num¬ 
ber of spontaneously disintegrating atoms of a radioactive substance 
during a given time interval and the number of cosmic-ray particles 
impinging on a definite area during time t. If we have to do with some 
kind of complicated electronic system consisting of a large number of 
elements, each of which can break down with a small probability 
during unit time and independently of the states of the other elements, 
then the number of elements failing in the time interval (0, t) is a 
stochastic process. In many cases, this process is well described by 
the Poisson process. There is literally no limit to the number of such 
examples.' 

Here we shall examine two simple properties of Poisson processes. 

The time interval between two successive occurrences of some event 
that interests us is a random variable, which we denote by t. Find 
the probability distribution of t. Since it is obvious that the event 
is equivalent to no event occurring in the time interval t that 
we record, 

P {t>> 

The desired distribution function is thus given by the formula 

P{x<l}=l_e-Ai 


(9) 



296 


Chap. 10. The Theory of Stochastic Proceeds 


This result may be interpreted physically in many ways. For exam¬ 
ple, we can regard it as a time distribution of the free motion of a 
molecule or as the time distribution between two failures of elements 
in a complex electronic system. 

Suppose we know that during an interval of length t there occur 
n{ni>0) events of our process. The question is: under that condition, 
how are the occurrences of these events distributed within the given 
time interval? It turns out that the conditional distribution of times 
of occurrence of the events is uniform in this time interval. Moreover, 
the instants of occurrence of all n events are mutually independent. 

Denote by B the event which consists in the occurrence of n events 
of the process during the time interval (0, t). We know from the fore¬ 
going that the probability of 5 is 




Insofar as the events have already come within the time interval 
(0, we can individualize them and concentrate our attention on 
one of them. Denote by A the event which consists in the fact that 
the event which interests us occurred in the interval (n, b) belonging 
to (0, t). dur problem is to determine the probability P{AIB). By 
the multiplication theorem. 


P {AJB) :- 


P{AB) 

P{B) 


We have to determine the probability of the joint occurrence of events 
A and B. For this purpose consider the events C^r which consist in 
the fact that: (a) some kind of s events of the process will fall in (0, a) 
but the one that interests us will not be among them; (b) r events of 
the process including the one we are interested in will occur in the 
interval (a, b)\ (c) the remaining n — r —s events will fall in the inter¬ 
val (b, t). dbviously, the events are incompatible for distinct pairs 
(s, r) and for this reason 


n- 1 n-s 

= 2 S c„ 

s=0 r= I 


By the assumption concerning the absence of aftereffect, the pro¬ 
bability of obtaining s events of the process in (0, a), r events 
in (n, b) and n—s —r in the interval (b, t) is 


..Ju, ib-u) 

si r\ . (nr-s-~r)\ 


( 10 ) 


This expression, however, is different from the probability of the 
event for we did not take into account the necessity of our 
particular event occurring in the interval (a, b). In order to take 



Sec. 51. The Poisson Process 


297 


this circumstance into account we also have to multiply (10) by 
the probability that the event which interests us will fall in the 
interval (a, b). This probability is equal to the ratio of the number 
of ways of selecting r —1 elements from n —1 to the number of 
ways of choosing r elements out of n; that is, it is equal to 


Thus, 


n—\ n—s 




s=0 r=l 

Simple algebra leads us to the equation 

Collecting the computations, we find 

P (AjB) = ^ 


s)l 


( 11 ) 


This equation proves the formulated result. 

We note that the theory developed in this section may be applied 
not only on the assumption that the parameter t plays the role of 
time. With this in mind, we add a further example. 


Example. Points are dispersed in space in accordance with the 
following requirements: 

(1) the probability that k points will be found in the region G 
depends solely on the volume v of the region but is independent both 
of its shape and its position in space; denote this probability by the 
symbol Pft(u); 

(2) the numbers of points that fall in nonoverlapping regions 
are independent random variables; 

CO 

(3) S P*(Aa)=o(Au) 

The conditions that have been imposed are nothing other than 
conditions of stationarity, absence of aftereffect, and ordinariness. 
And so 


Pn W 


n\ 


If minute particles of some substance are suspended in a liquid, 
then under the impacts of surrounding molecules these particles will 
be in a state of constant chaotic motion (Brownian motion). As a 
result, at each instant of time we have a random distribution of par¬ 
ticles in space, which is exactly what we have been speaking about. 



298 


Chap. 10. The Theory of Stochastic Processes 


According to the theory of this example, we must consider that the 
distribution of particles entering some definite region will be subject 
to the Poisson law. 

Table 12 compares the results of experiments with particles of 
gold suspended in water and computations by Poisson’s law (ta¬ 
ken from a paper by Smolukhovsky). 

TABLE 12 


Number oJ 
particles 

Number of 
observed 
cases 

Frequency, 

m 

511 

n\ 

Number 
of cases 
computed 

0 

- 

112 

0.216 

0.213 

110 

1 

168 

0.325 

0.328 

173 

2 

130 

0.-251 

0.253 

131 

3 

69 

0.133 

0.130 

67 

4 

32 

0.062 

0.050 

26 

5 

5 

0.010 

0.016 

8 

6 

1 

0.002 

0.004 

2 

7 

1 

0.002 

0.001 

1 


The constant X = that defines the Poisson law is chosen equal 
to the arithmetic mean of the observed number of particles, that is 




0x112-+- 1X168 + 2X130-+-3 x 69 + 4 x 32 + 5 x 5 + 6x1-1-7x1 

518 


1.54 


Sec. 52. Conditional Distribution Functions 
and Bayes' Formula 


For the further development of the theory we have to generalize 
the concept of conditional probability that was introduced in the first 
chapter to the case of an infinite number of possible conditions. In 
particular, we have to introduce the concept of a conditional distri¬ 
bution function with respect to some random variable. 

Consider a certain event B and a random variable I with a distri¬ 
bution function F (x). Denote by the event that 

x—a<|< x + p 


By virtue of the definitions of Chapter I, 


P = P {A,^} P {BlA,,) = [F (X + p)-f (x-a)] P {BlA,^ 


which gives 


p{fi/+}= 


F{x+(i)-F(x-a) 











Sec. 52. Conditional Distribution Functions and Bayes' Formula 


299 


The limit 




o!‘™o (*-«) 


if it exists * is called the conditional probability of event B provided 
that and is denoted by the symbol P {BIx). It is obvious that for 
fixed X, P {BIx) will be a finitely additive function of B defined on 
some field of events. 

Under certain conditions, which are practically always fulfilled, 
V{Blx) will have all the properties of ordinary probability satisfying 
the Axioms 1 to 3 of Sec. 8. 

If Tj is a random variable and B denotes the event that ti<^, then 
the function 0 (^/a:)=P{t]<^/x}, which, as it is easy to see, will 
be a distribution function, is called the conditional distribution fun¬ 
ction of the variable t) provided that l—x. 

It is obvious that if F {x, y) is the distribution function of a pair 
of random variables I and t), then 


(!){yfx)~ lim 

a, 3->0 


y)—F{x~a, y) 
F(a;+P, Qo)—F{x—a, oo) 


provided that this limit exists. 

If the function P{B/a:} is integrable with respect to F (x)^ we 
have the formula of total probability: 


P {B} = J P {BIx] dF (x) 

To prove this formula we divide the interval of variation of 
the variable | by the points x,- (/ = 0, dzl, ±2, ...) into the 
subintervals Denote by A,- the event 

By virtue of the extended axiom of addition we have 

P {B} = 2 P {B^,} = 2 P {B/A,] [f -F (^,)] 


We now partition the subintervals (x^, into still smaller 

subintervals so that the maximal length of the subintervals thus 
obtained should tend to zero. From this, by the definition of 
conditional probability and by the Stieltjes integral we get 


P {B} = J P {Bjx) dF (x) 

In particular, 

^{y)=^{'f\<y} = ^^(y/x)dF (x) (i) 


* This limit exists for almost all values of x, in the sense of the measure 
defined by the function F {x). 



300 


Chap. 10. The Theory of Stochastic Processes 


If there exists a probability density function of the variable i), then 

(p(y) = \(p{y/x)dF{x) (!') 


where (p(ylx) is the conditional density function of the variable q. 

Example. To illustrate the use of formula (1), let us consider the 
following problem in the theory of gunfire. Errors of two kinds are 
involved in firing at a target: (1) errors in defining the position of the 
target and (2) errors of fire due to a large number of diverse causes 
(variations in the size of the charge in the shell, irregularities of ma¬ 
chining of the casing of the shell, errors in sighting, slight fluctuations 
in atmospheric conditions, etc.). Errors of the second kind are called 
technical dispersion. 

A total of n independent shots are fired at one defined position of 
the target. It is required to determine the probability of one hit. 

For the sake of simplicity, we confine ourselves to a consideration 
of a one-dimensional target of size 2a and the shell will be assumed 
a point. Denote by f (x) the density function of the position of the tar¬ 
get and by (Pj (a:) the density function for the points of impact of the 
iih shell. 

If the centre of the target lies at point 2 , then the probability of 
hitting the target in the ith shot is equal to the probability of falling 
in the interval ( 2 —a, z-f-a), that is, it is equal* to 

2 + a 

J (P/ (x) dx 

z-a 

The conditional probability of a miss in the iih shot provided the 
centre of the target lies at point 2 is 

2+a 

I — J (Pj- (x) dx 

2-a 

The conditional probability of a miss for all n shots (given the 
same condition) is equal to 

„ / 2+a . 

n (1 — j 

^=1 \ 2-a / 

Whence we conclude that the probability of at least one hit, given 
that the centre of the target is at z, is equal to 


n / \ 

•III 1 — i <^i(x)dx) 


/ 


* We assume here that the determination ol the target position and the 
technical dispersion are independent. 



Sec. 52. Conditional Distribution Functions and Bayes' Formula 


301 


The unconditional probability of at least one hit (by formula (1)) 
is thus 


z + a 


1—n(i— S (fiMdx 

‘ ~ \ z - a 


dz 


If the firing conditions do not change from shot to shot, then 
(p,-(jc) = cp( a:) (f= 1, 2, n) and consequently 




z + a 


1 


1 


J q) {x) dx 


\ 


n 1 


z-a 


dz 


As before, let denote the event Xf^l < x^^^. According to 
the classical theorem of Bayes 

P lAi\P {B/AA 

P{AdB}= 

If f (jc) = P {I < a:} and P{^<Cx/B} have continuous derivatives 
with respect to x, then, using the Lagrange theorem, we get 

F'(7i)P{B/AA 

P = pf {x,/B) {xi+, —X,) = -—I- - (xi+, —X,.) 

where a:/< X,-< In the limit, when x ^— 


'i + l 


or 


X, we get 


(x/B) = 


p (A) P {B/x\ 

P{^} " 


P^{x!B) = 


p (A) P {B lx) 

^ P [B/x) p (a) dx 


( 2 ) 


It is natural to call this equality Bayes' formula. 

Now let the event B consist in the fact that some random va¬ 
riable r\ takes on a value between y—a and i/ + P let the 
conditional distribution function Q>{ylx) of the variable q have, 
for every x, the continuous density p (yjx). Then, as follows from 

Equation (2), if tends to pr,{y!x) uniformly in x as a 

and P tend to zero, then the following equality holds: 


Pi(x/y) = 


P (x) Prt (y/x) 

5 Pt) (y/x) P (X) dx 


We shall make extensive use of this formula in the next chapter. 



302 


Chap. 10. The Theory of Stochastic Processes 


Sec. 53. Generalized Markov Equation 


We now take up the study of stochastic processes without after¬ 
effect confining ourselves only to the most elementary problems. 
We assume, in particular, that the set of possible states of the system 
is the set of real numbers. Thus, a stochastic process is a set of random 
variables dependent on a single real parameter t. We shall call 
parameter t the time and speak of the state of the system at one or 
another instant of time. 

We will obtain a complete probabilistic characteristic of a proc¬ 
ess without aftereffect by specifying the function F {t, x\ x, y), which 
is equal to the probability that at time x the random variable |(x) 
will take on a value less than y if it is known that at time t (^<x) we 
had the equality ’g{t)—x. Additional knowledge about the states of 
the system at times prior to t does not alter the function F {t, x\ x, y) 
for processes without aftereffect. 

Let us now note some conditions that the function F{t, x\ x, y) 
must satisfy. First of all, since it is a distribution function, the follo¬ 
wing equations must hold for any * x, t and x: 

(1) lim F{t, x\ X, y)—0, lim F{t, x; x, y)=l; 

y-* - CD y -*■ +CD 

(2) the function F {t, x] x, y) is continuous from the left with res¬ 
pect to the argument y. 

Now suppose that the function F(/, x\ x, y) is continuous with res¬ 
pect to ty X and with respect to x. 

Consider the instants of time/, s, x(/<;s<x). Since the system pas¬ 
ses from state x at time t to one of the states of the interval ( 2 , z+dz) 
at time s with probability d^F{ty x\ s, 2 ) and from state 2 at time s 
to a state less than y at time x with probability F(s, 2 ; x, y)y we 
find, from formula (1) of Sec. 52, that 


F{ty x\ X, y)='^F{Sy 2 ; X, y)d^F{ty x\ s, 2 ) 


It is natural to call this equality the generalized Markov equationy 
for it represents an extension of Equation (1), Sec. 17, of the theory 
of Markov chains to the theory of stochastic processes and in this 
theory plays just as important a part as the aforementioned identity 
does in the theory of Markov chains. 

The probability F(/, x\ x, y) is so far defined only for x>/. We 
extend this definition by taking 

lim F(ty x‘y X, y)= lim F(C a:; x, y)=E{Xy y)=* 

T-*-/+0 


0 for y^x 
1 for ^ > x 


* Note that the parameter t (time) is ordinarily specified on the half-line 



Sec. 54. Continuous Stochastic Processes. Kolmogorov's Equations 


303 


If there exists a density 

/(/, x; T, y) = ^^F{t, x\ t, y) 
the obvious equalities 


y 

^ f{t, x\ T, x)dz = F{t, x\ T, y) 

“ 00 


J / (/, x; T, z)dz= 1 

hold. 

For this case, the generalized Markov equation should be written 
in the form 


/(/, x; T, y)= J /(s, z; x, y)f{t, x\ s, z)dz 


Sec. 54.- Continuous Stochastic Processes. Kolmogorov's Equations 

We say that a stochastic process |(/) is continuous if during small 
time intervals appreciable increments are obtainable only with a 
small probability |(0- Here we demand that the process |(/) be more 
strongly continuous, namely: no matter what the constant 6(6>0), 
the following relation holds: 


lim 

A/ -> 0 


I i/-xi >6 


x; 


t, ^) = 0 


( 1 ) 


Our immediate task is to derive the differential equations which, 
upon fulfillment of certain conditions, will be satisfied by the func¬ 
tion F {t, x; T, y) that governs a continuous stochastic process without 
aftereffect. These equations were first proved in rigorous fashion by 
A. N, Kolmogorov (though the second one had occurred prior in the 
works of physicists) and are called Kolmogorov's equations. 

We assume that 

(1) the partial derivatives 

dF{t, x; T, y) {t, x\ x, y) 

dx dx^ 


exist and are continuous for all values of t, x, y and x > /; 
(2) for arbitrary 6>0 there exist the limits 


lim 

A/ 0 


A/ 


^ {y—x)dyF{t — At, 


\y-x\<6 


x; 


/, y)^a{t, x) 


( 2 ) 



304 


Chap. 10. The Theory of Stochastic Processes 


and 


lim 


At 


(jl? J (i/——A/, a:;; c) 


1^-JCI <6 


( 3 ) 


and this convergence is uniform in x. * 

The left-hand sides of (2) and (3) depend on 6. However, this 
dependence, by virtue of the definition of the continuity of a process 
(that is, by virtue of (1)), is only apparent. 


Kolmogorov’s First Equation. If the foregoing conditions {!) and 
(2) are fulfilled, the function F(t, x; x, y) satisfies the equation 

dF{t, x\ %, y) dF{t, x\ t, y) bit, x)d‘^F{t, x\ x, y) 

dt ~ a\L, X) 2 dx^ 


Proof. According to the generalized Markov equation, 

F{t^M, x\ T, y)= J F{t, z\ T, y)d^F{t — M, x\ t, z) 

Also, by virtue of the properties of a distribution function, 

F{t, x\ T, y)=^F(t, X', X, y)d^F{t—At, x; t, z) 

From these equations we conclude that 

F(t — M, x; T, y) — Fit, x\ x, y) _ 

M 

= X, y)—F{t, x\ X, y)]d^F {t — At, x\ t, z) 


* A. N. Kolmogorov proved the existence of the limits ait, x) and bit, x) 
on the assumption that for given x and s the determinant 

AT, t', y') -^fis, X, r, y") 

^fis, X, t', y') ^fis, X, t\ y") 

does not vanish identically for arbitrary t', t", y% tf. Literally repeating Kol¬ 
mogorov’s reasoning, one can prove that from (1) and upon the assumption that 
for given x and s the determinant 

y') ^F(s, X, r, y") 
dx dx 

F (s, t', y') ^ F (s, I’, y") 

does not vanish identically for arbitrary t', t", y', y", there follows the existence 
of the limits ait, x) and bit, x). 

At the end of this section we will find out what physical meaning the 
functions a and b have. 



Sec. 54. Continuous Stochastic Processes. Kolmogorov's Equations 


By Taylor’s formula, given the assumptions we have made, the 
following equality holds: 


F(t, z\ T, y) = F(t, x; t, y) + {z—x) + 




The following analytical transformations do not require any expla¬ 
nations: 


Fit —At, x; T, y) — F{t, x; t, y) _ 

At 

= ^ j ^ d^F {t~M, x\t, z) + 

1 2-X i ^ ft 

1 2-a: i < ft 

= J —T, ^)] — x; ^ 2) d- 

q_ . ± J 2)_j_ 

i 2-A; I <6 

. I d^F it, x; T, y) ^ 

'*■ 2 aA:2 ^ 

x~ j* [( 2 ~x )2 + o (2 —x)2] —x; z) (5) 

i 2-Ar i < 6 


We now pass to the limit, letting A/—^0. The first terra on 
the right-hand side, by virtue of (1), has the limit 0. The second 

dF 

term, by (2), has the limit a(^, Finally, the third term can 

i dPF 

differ from y^(^» Fy a summand that tends to zero as 

6—►O. But since the left-hand side of the equality is independent 
of 6 and the limiting values just mentioned are independent of 6, 
the limit of the right-hand side exists and is equal to 


dE{F XV T, y) I , .^^F{t, X-, T, y) 

ait, X - ^ 


From this we conclude that the limit 


lira 
u 0 


F{t—At, x\ X, y) — 
At 


F {t, xi 


X, y) 


dF{t, X-, T, y) 

Ot 


exists. 



306 


Chap. 10. The Theory of Stochastic Processes 


Equality (5) leads us to Equation (4). 

If it is assumed that a density function 

d 

f{t, x\ T, y)=-^F(t, x; x, y) 


exists, then simple differentiation of (4) shows that the density 
/(/, x; T, y) satisfies the equation 




Let us now derive Kolmogorov’s second equation. In doing so 
we will not strive for the greatest possible generality and will 
make assumptions that are not required by the essence of the 
matter. Besides the assumptions that have already been made we 
impose on the function F {t, x\ x, y) the following additional restric¬ 
tions: 

(3) there exists a probability density function 


f{t, x\ X, y) 


dF{t, X-, T, y) 
dy 


(4) there exist the continuous derivatives 


df (t, T, y) ^ l[a(T, y)f{t, x; x, y)], ^ [^(t, y)f{t, x\ x, y)] 


Kolmogorov’s Second Equation *. If conditions 1 to 4 are ful¬ 
filled, then for a continuous stochastic process without aftereffect the 
density f{t, x; x, y) satisfies the equation 


dj{t, .y; t, y) 
d% 


'T. y)]-F 


Proof. Let a and b{a<Cb) be some numbers and R{y) be a non¬ 
negative continuous function with continuous derivatives up to 
second order inclusive. Besides, we will demand that 

Riy)^^ for y < (1 and y '> b 

From the condition of continuity of the function R{y) and its 
derivatives we conclude that 


Ria) = R {b) = R'(a) =:R' (b) - R" (a) - R" {b) = 0 (7) 


* Kolmogorov’s second equation was earlier derived by the physicists Fokker 
and Planck in connection with the development of diffusion theory. 



Sec. 54. Continuous Stochastic Processes. Kolmogorov's Equations 


307 


First, note that 

b b 

j* e.(y)dy x-, t, y)R(y)dy = 

a a 

^ lim C /(^ t + At, y)—f{t, x\ T, j/) 

A-r - n tJ Kx 


R (y) dy 


According to the generalized Markov equation, 

/(/, x; t-I-At, x\ t, 2 )/(t, 2 ; t + At, y)dz 


and so 

- dk-’ ^' Riy)dy== 

a 

+ y)R(y)dzdy— 

— x\ T, 1/)/? (^) = 

^ 2 -; t + At, y)R{y)dydz — 

— x; T, 

-t, «/)[S/(t, y\ t + At, z)R{z)dz — R{y)\dy 

The transformations that have been performed are obvious: the 
first time we changed the order of integration and the second time 
we changed the notation of the variables of integration (we replaced 
y hy z and 2 by y). 

By Taylor’s formula 

R{z) = R (y) + (z~y) R' (y) + ^0 —y? R” (y) + o [(z — i/)”] 

Since, by virtue of the boundedness of the function R{z) and the 
condition (1), 

J f (t, y\ T + At, z) R (z) dz = o (Ax) 

5 / (x, T -b Ax, 2 ) dz = I -f 0 (Ax) 

\y-z 1 < fi 


and 



308 


Chap, 10. The Theory of Stochastic Processes 


it follows that 

J /(t, y\ t + At, z) R {z)dz — R{y) = 

==R' in) I y\ t+At, z)dz~{- 


y--z 1 < 6 


1 


+ 5 [(z—yf-{-o{z—y)^lf{T, y\ t+At, z)dz^o{^x) 

1 i/-al < 6 


a 

= lim 5 j\;; T, J {z —y)f{T, y\ t + At, z) dz-\- 


At O ' 
1 


y~z 1 < 6 


H -I [(^—# + o (2 — #]7(t, yv t+At, z)dz-^ 


y-z 1 < 


I 


+ o (At) J dy 


We pass to the limit, letting At—^ 0. By the assumption of uni¬ 
form convergence to the limits in (2) and (3) we conclude that 
the preceding equality may be written in the form 

a 

= ^fitr x; T. y) a(T, y) R' {y) + Y^iT, y)R''{y) dy 
Since R' (y) = R" (y) = 0 for y^a and y^b, it follows that 


u 

5 


df{t, x\ T, y) 


d% 


R (y) dy 


u 

T, £/) a(T, y)R' (y) + jb{T, y)R"{y) dy (8) 


Taking advantage of the formula of integration by parts and of 
equalities (7), we find 

h b 

^fit, x\ T, ^)u(T, y)R' {y)dy^ — ^ R{y)^^[aii, y)f{t,x\x, y)]dy 

a a 

b b 

T, y)b{T, y)R'' {y)dy=-^R{y)-~[b{x, y)l{t, x\ t, y)]dy 



Sec. 54. Coniimtous Stockasiic Process. Kolmogorov's Equations 


Substituting the expressions thus obtained into (8)^ we get 

b 

df {i, x; T, y) 


I 


a 


dx 


R{y)dy = 


dy 


[a (t, y) f (t, x; t, y)\ + 


+ 


2 5f 


, y>i\\Ry)dy 


This equation can obviGusly be written as 


df {t, x; t:, y) . d 


dx 


dy 




- 4 * 

— • y)f {t, X-, T, J/)] I /?((/) it!/= 0 (9) 

Since the functiGn R {y) is arbitrary , (6) follows from the last iden¬ 
tity. Indeed, suppose that this is not so. Then there exists a number 
quadruple (i, x; t, y} such that the expression in the braces of (9) 
is different from zero. By virtue of the assumptions made this ex¬ 
pression is a continuous function; hence, there will be an interval 
a '<. y <; P where the expression retains i ts sign. Ifri ^ a and p , then 
we assume /?(^)=Gfor y^a andi/^P and > 0 fora<^Cp. 
Given this choice of Riy), the integral on the left-hand side of (9) 
must be different from zero. We arrive at a contradiction. Thus, 
our assumption is erroneous and, hence, (6) follows from (9). 

Naturally, the basic problem that has to be solved does not consist 
in verifying that the given function/(/, x; t, y) satisfies the Kolmogorov 
equations, but in seeking an unknown lunction / (t x; x, y) on the basis 
of these equations in which the coefficients ait, x) and bit, x) are 
assumed to be known. What is sought here is not, of course, just any 
solution of the Kolmogorov equations, but only those which satisfy 
the following requirements: 

t, X, T, y. 


1. f {t, x; X, i/) > 0 

2. ^ f it, x; X, y) dy = \ 

and for any d > 0 

3. lim C fit, x; x, 

T - i . •J, ^ . 


iu = 0 


i i/- a; 1 > 6 


We shall not undertake to clarify the conditions that must be im¬ 
posed on the functions a {t, x) and b (t, x) for a solution of the Kolmo¬ 
gorov equations to exist that would satisfy the enumerated require¬ 
ments and would also be unique. 




310 


Chap. 10. The Theory of Stochastic Processes 


We shall slightly strengthen the requirement of continuity in order 
to elucidate the physical meaning of the coefficients a {t, x) and b (t, x): 
in place of (1) we assume that for any 6>0 the following relation 
holds: 


lim 

A/ - 0 


1 

At 


J {y—xfdyF{t — M, x\ 
I y-x 1> 6 


t, y) = 0 


in 


It is easy to see that (1) follows from (T). The requirements 2 
and 3 can now be written differently, namely; 

lim {y—x)d F {t — M, x; t, y) = a{t, x) (2') 

A/ -> 0 

and 

lim I {y—xfd F{t — M, x\ /, y)=b{t, x) (3') 
A^ 0 J 


The other requirements and also the final conclusions due to 
substitution of (V) for (1) remain unchanged. Since 

^{y--x)dyF {t—At, X] t, y) = m[l{t) — l{t — At)] 

is the expectation of variation of during time At, and 
^iy—x)^dyF{t — Ai, x\ t, f/)=M ——AO]" 

is the expectation of the square of the variation of | (t) and, consequent¬ 
ly, is proportional to the kinetic energy (under the assumption that 
I (t) is the coordinate of a point moving under the effect of chance ac¬ 
tions), it is clear from (2') and (3') that a{i, x) is the average rate of 
change of l{t), and b{t, x) is proportional to the mean kinetic energy 
of the system under study. 

We conclude this section with a consideration of a special case of 
the Kolmogorov equations when the function /(/, x; t, y) depends 
on t, T and y —.v, but not on x and y themselves. Physically, this 
means that the process is homogeneous in space: the probability of the 
increment A^y —x is independent of the position x that the system 
was in at time t. Obviously, in this case the functions a{t, x) and 
b (/, x) do not depend on x and are functions solely of the argument t: 


a{t)=a{t, x), b{t) = b{t, x) 

In our case, the Kolmogorov equations may be rewritten as 


iL 

dt 

iL 

dx 


— ait) 

— aix) 


dx 

R 

by 


-\bit) 
+ "2 


I 

I 

dy^ ) 


( 11 ) 



Sec. 55. Purely Discontinuous Stochastic Processes 


311 


We first consider the special case when a{t)=0 and b{t) = l. 
Then Equations (11) become the equation of heat conduction 


df _ 1 d^f 
dx 2 dy'^ 

and its adjoint > 

df _ 1 d^f 

dt~^ 2 dx^ , 


( 12 ) 


From the general theory of the heat-conduction equation it is 
known that the only solution of these equations that satisfies the 
conditions (10) is given by the function 


/(/, j:; t, y) = 


1 

Y 2ji (t—/) 


2 (T-n 


The change of variables 


a 

t 

r == J b{z)dZj 

a 


X 

y'=y~l 

a 


X 

t' = J 6 {z) dz 

a 


reduces (11) to Equations (12). This makes it possible to write 
the desired solution of Equations (11) as 

5 {y-x-Ap 

fit, x; T, y)=-^e' 
o y 2n 

where 

T T 

A^^a(z)dz, o^ = ^biz)dz 

1 t 


Sec. 55. Purely Discontinuous Stochastic Processes. 

The Kolmogorov-Feller Equations 

In modern natural science an important role is played by proc¬ 
esses in which changes occur in a system in jumps and not continuously. 
Some problems of this nature were given is Sec. 50. 

We say that a stochastic process ^(^) is purely discontinuous if 
in the course of some time interval (t, /+A/) the quantity I (t) remains 
unchanged and equal to x with probability 1— p{t, ;c)A/+o(AQ and 
can undergo change only with probability p(t, a:)A^+o(A^) [here 
we assume that the probability of more than one change of |(/) during 
the time interval At is o(A^)]. Quite naturally, since we confine our¬ 
selves to processes without aftereffect, the distribution function of 



312 


Chap. 10. The Theory of Stochastic Pmce&ses 


changes of l{t) following a jump is no longer dependent on the value 
that |(i) had at various times prior to the jump. 

Denote by P (i, x, y) the conditional distribution function of ^ (/) 
provided that a jump occurred at time t and that immediately prior 
to the jump, s(/) was equal to x (that is, H(i—0)=v). 

The distribution function F(/, v; t, y) can readily be expressed 
in terms of the function p{t, x) and P{t, x, y), namely: 

F{t, x; T, y)-=[\—p{t, x){T—t)]E{x, y)-{- 

+ {x — t)pit, x)P{i, x, y)^0{T — t) (1) 

According to definition, the functions p{t, x) and P{t, x, y) 
are nonnegative; and since P{t, x, y) is a distribution function, 
the equalities 

P{t, X, —oo)-0, P{t, X,+oo)==i 

hold. 

Besides, we will assume that p{t, x) is bounded and that both 
the functions p{t, x) and P(/, x, y) are continuous in t and x 
(actually, it is sufficient to assume that they are Borel measur¬ 
able in x). 

We make no assumptions with respect to the function F {t, x; x, y) 
and only retain its definition for t 

lim F {i, x; x, y)^ lim F {t, x; x, y) = E{x, y) = 

X t + Q t -*x-0 

_ J 0 for y^x 
~ \ 1 for y > X 

One of the problems of this section is to prove the following 
theorem. 

Theorem. The distribution function F{i, x; x, y) of a purely 
discontinuous process without aftereffect satisfies the following two 
integro-differential equations: 

. — = /?(/, x) [F{t, x; X, y)~ 

— 5 F{t, z\ X, y)d^P{t, x, z)] (2) 

~ ~ I ^)FlzF{t, x; X, i/) + 

— 30 

+ Jp (x, 2)P(x, z, y)d,F(t, x; x, z) (3) 

Equation (2) was obtained by A. N. Kolmogorov in 1931; under 
our assumptions, both Equations (2) and (3) were obtained by 
W. Feller in 1937. It is therefore natural to call Equations (2) 
and (3) Kolmogorov-Feller equations. 



Sec. 55. Purely Discontinuous Stochastic Processes 


Proof. By virtue of the generalized Markov equation, 

F(/, x: T, = + 2 ; T, y)d^F{t, x; z) 

Substituting the value of F {t, x; Z + 2 ) from formula (1), we find 

F{ty x; T, y) = 

= lF{t-\-At, 2 ; T, y)d^[l —p{t, x) A/+ 0 (A/)] £■ (x, 2 ) 4 - 

+ j7^(/-f-A^ 2 ; T, y)d^ [pit, X) A/ + o(A/)] P(/, X, 2 ) 

Since 

j F (/ + A^, 2 ; T, y) d^E (x, z) = F{t-\- At, x; x, y) 
it follows that 

F{t, x; X, y)=[\~p{t, x) At] F{tAt, x; x, y)-{- 

■^Atp{t, x)Jf(/ + A/, 2 ; x, y)d^P{t, X, 2 ) 4-0 (A/) 


And from this we get 
F{t-{-M, x; t, y) — F{t, x; x, f/) 




pit, x)F{t-\- At, x; x, ^) — 


Pit, x)\ F it-\- At, z\ X, y)d^Pit, X, 2)4-o(1) 


Passing to the limit gives us (2). 

The Markov equation and (1), and also the definition of the 
function £ (x, 2 ) enable us to write the following chain of equa- 


F it, x; x 4 -Ax, ^) = 5 f(x, 2 ; x 4 -Ax, y)d^Fit, x; x, z) --= 

= 2 ) Ax] £( 2 , i/)4-Axp(x, 2)P{x, 2 , y)^o 

xd^F it, x; X, 2 ) = 
y y 

= ^ d^Fit, x; X, 2 )~Ax j pix, z)d^F{t, x; x, 2 )-!- 

— 00 

4 -A/ J/?(x, 2 )P(x, 2 , y)d^Fit, r, x, 2 ) 4-0 


X 


—00 


dF 


follow from 


Equation (3) and the existence of the derivative 
this in the ordinary way. 

We shall solve yet another important problem in applications: 
What is the probability of a system changing its state n times 
(/z = G, 1, 2, ...) during a time intervai from t to x(x>r? 



314 


Chap. 10. The Theory of Stochastic Processes 


Denote by x, t) the probability that a system starting 

from state x at time t will change its state n times up to time x. 
We begin the solution with n=Q. 

Write the following equation: 

t + At) + / 7 o{/, ^ t)[ 1 —;?o{t, a:, t+At)] ( 4 ) 

which states that the absence of changes in the state of the sys¬ 
tem during the time interval (/, x) can take place in two mutually 
exclusive ways: (1) the system has not changed its state during a 
large interval of time (/, x+Ax); (2) the system did not change 
its state prior to time x, but during the time internal (x, x+Ax) 
its state changed. Since by definition of a purely discontinuous 
process 

Po (x, X, x-f Ax) = l—p(x, x)Ax-fo{Ax) 


equation (4) may be written otherwise: 


Po {t, X, T-f At) —Po {t, X, T) 
At 


— Po(^ ^ X)~{-0 (1) 


Whence, letting Ax—^0, 
exists and that 


we find that the derivative 


dpniC X, T) 

dx 


apo(/, x, T) 
dx 


p,{t, X, x)p(X, X) 


Integrating this equation, we find 

T 

-J' p (u, x)du 

p,{t, X, x) = Ce ^ 

Since 

p„(x, X, x) = l 

it follows that C = 1 and 

T 

— J' p («, X) du 

Po{t, X, T)=e ^ (5) 

We will now see that knowing pQ{t, x, x) and also the function 
P{ty X, y) defined earlier, we can calculate any probability Pn (A t^)- 
Indeed, an p-fold change of the state occurs in the following manner: 

(1) prior to time s(/<;s<Ct) the system does not change its state 
[the probability of this event is equal to po(^, x, s)]; 

(2) during the interval (s, s-f As) the system changes its state [the 
probability of this is equal to pi (s, x, s-fAs)=p(s, x) As4-o(As)l; 

(3) the probability that the new state at which the system will 
arrive will lie between y and p+Ap is equal to 

P (s, X, ^y)~P (s, X, y)^^yP (s, X, y) 



Sec. 55. Purely Discontinuous Stochastic Processes 


315 


(4) finally, during time (s+As, t) the system will change its state 
n — 1 times [the probability of this event is p„_i(s+As, y, t)1. 

The probability that all four of these events will occur is, by 
the multiplication theorem, equal to 

s)[/?(s, ;c) + o(l)] As.A^P(s, (s + As, y, x) 

Since s and y may be arbitrary (^ < s < x and — oo < ^ < oo), 
then by the formula of total probability 

T 

p„(/, x, x)= J \po{ty X, s)p(s, a:)p„_i(s, y, T)dyP(s, x, y)ds = 

t 

X 

= ^Po(f, X, s)p(s, x)5p„_i(s, y, T)dyP(s, x, y)ds (6) 

t 


Whence, in particular, 

X 

Pi{t, X, T) = lpo{t, X, s)p(s, x)5po(s, P, x)dyPis, X, y)ds (7) 

t 

The procedure for determining p„ (t, x, x) is obvious; by formula 
(5) we find Po(^ '*^)» by formula (7) we compute px{ty x, x) and 
in succession, p^{t, x, x), p^{t, x, x) and, finally, p„(^ x, x). 

Example 1. Let the variable ^{t) that interests us be the num¬ 
ber of changes of the state during time from 0 to x. Assuming 
that p{t, x) = fl, where a is a positive constant, find p„(^, x, x). 

In our case, possible states of the system will be all the nonne¬ 
gative integers (x = 0, 1, 2, ...) and only these integers. Since in 
each change of state the variable ^{t) increases exactly by 1, it 
follows that 

{ 0 for u^x 

X, y) = { . . 

' t 1 for p > X 

From formula (5) we have 


Po(^ x) = e 


_ n-aix-t) 


According to (7), 


Pi(/, X, x) = 5po(^ X, s)p(s, x)po(s, x+1, x)ds: 





316 


Chap. TO. The Theory of Stochastic Processes 


By formula (6) 


p,{t, X, t) = p, {t, X, s)/7 (s, x)p^is, x+ \px)ds 


MLzDI 


21 


Now suppose that 


m-i 




By formula (6) 

X 

p,,(t, X, t)=-^Po(N X, s)p(s, x)p„_j(s, x+l, t)4s 

t 

(n-1)! 


[a {x t)\^ Q-a{x~n 
n\ 


This proves that for any integer n 


Pnii. 'f) 


n\ 


■t) 


The solution of our problem is therefore the Poisson law. In par- 
ticular, 

p„(0, 0, = 


«! 


It is easy to see that the function 


Fit 


/ 

, x; T, y) = ■ 


0 


for y^O 


^ for y>0 


\ n<y 


is the solution of integro-differential Equations (2) and (3). 

Example 2. At time /==0 there are N radioactive atoms. The 
probabijity of decay of an atom in the time interval (/, t-j-At) 
is equal to aN (i) At+ o{At), where a>0 is a constant and N {t) 
is the number of atoms that have not decayed up to time t. Find 
the probability that during the time between t and x there will 
be n disintegrations*. 

This is a typical purely discontinuous stochastic process. The 
quantity n can, quite understandably, take only the values 0, 1, 
2.. N {t). 


* We assume here that the decay products do not disintegrate and at any 
rate do not affect the atoms that have not yet disintegrated. 



See. 55. Purely Discontinuous Stochastic Processes 


317 


By the condition of the problem 

{ 0 for 

a(N—x) for 
and 


and x'^N 
0<x^N 


P {tr X, y)-= 


0 for y^x 
1 for y>x 


Let us first evaluate the probability that during the time from 
0 to / there will be n disintegrations. By formula (5) 

T 

- J p (.i, 0) dt 
Po{0\ 0^ x) ~e ° 

In exactly the same way we have 

Then, by formula (7), 

T 

Pi(0, 0, t) = ^Po( 0, 0, s)p(s, 0}po(Sy I, T)ds = 

0 

V 

— ^ ds ~ 

0 

X 

= I ds = — (8) 

0 

By formula (6), it is easy to find,, in succession, pa(0, 0, x), 
Pg (0, 0, x), and so on and to prove that 

p„ (0, 0, T) = [e^ If (9) 

This is left to the reader. 

Obviously, when O^n^N — k we have the following equation: 

p„ity k, = —If (90 

We are now in a position to determine the probability we are 
interested in. We denote it by p„(f, -x). By the formula of total 
probability and then by using (9) and (9'), we find 

N-n 

p„ t, t) = 2 p»(o, 0, op„(^ *. t) = 

k=Q 
N — n 

fe =0 

N ~-n 



318 


Chap. 10. The Theory of Stochastic Processes 


Since 

C k f>n ryfl f~ik 

^ L,NL,N~n 

and 

N-n 

2 [e-*’-"(e“‘ —1)]*= 

it follows, finally, that 

p„(t, T)=C% [e-^^—e~^^Y 
It will readily be seen that the function 

I' 0 for t/^ X 

F(t, x; T, y) = I 2 *. r) for y< N—x 

[ 1 for ^ > N — X 

is the solution of the Kolmogorov-Feller integro-differential equations. 

Sec. 56. Homogeneous Stochastic Processes with Independent 
Increments 

We now consider an important class of stochastic processes which 
will be fully described in terms of characteristic functions. 

By a homogeneous stochastic process with independent increments is 
meant a collection of random variables i (/) dependent on a single 
real parameter t and satisfying the following two conditions: 

(1) the distribution function of the variable i(/+/o)—i(^o) is 
independent of U (the process is time-homogeneous); 

(2) for any finite number of nonoverlapping intervals (a, b) of 
parameter the increments of the variable i{t), that is, the differen¬ 
ces l{b) —1(a), are mutually independent (the increments are inde- 
pendentX- 

Of homogeneous stochastic processes with independent increments 
particular attention has been paid to the processes of Brownian mo¬ 
tion, which later were also termed Wiener processes. For processes of 
this type, the following two conditions are assumed to be satisfied 
in addition to the two that have already been given: 

(3) the variables |(f+4) — |(^o) are normally distributed; 

(4) M[E(/+«-E(fo)]=0 

D[|(/+4)^i(4)l=(y^^ where is a constant. 

In Sec. 51 we considered another homogeneous process with inde¬ 
pendent increments, the Poisson process. 

Before proceeding to obtain concrete results, let us consider sev¬ 
eral examples. In these examples, the foregoing conditions may be 
taken as a working hypothesis. Naturally, their sole justification is 
agreement of the conclusions with experiment. 



Sec. 56. Homogeneous Stochastic Processes with Independent Increments 319 

Example 1. Diffusion of Gases. Consider a molecule of some gas that 
is moving among other molecules of the same gas under conditions 
of constant temperature and density. We introduce Cartesian coor¬ 
dinates and see how, in the course of time, one of the coordinates of 
the chosen molecule (say the j^-coordinate) varies. 

Because of random collisions of this molecule with the other mole¬ 
cules, the coordinate will vary in time as it receives random incre¬ 
ments. The requirement that the conditions of the gas be constant 
obviously means that the process under study is homogeneous in time. 
In view of the large number of moving molecules and of the weak 
dependence of their motion, this is a process with independent in¬ 
crements. 

Example 2. Molecular Speeds. Again consider a gas molecule moving 
in a volume of space filled with the molecules of some gas at constant 
density and temperature. Again refer the entire space to Cartesian 
coordinates and follow the time-variations of the velocity component 
along one of the coordinate axes. In its motion the molecule will 
be subject to random collisions with other molecules. The velocity 
component, due to these collisions, will receive random increments. 
Again we have a homogeneous stochastic process with independent 
increments. 

Example 3. Radioactive Disintegration. The essence of radioacti¬ 
vity of a substance is known to consist in the fact that the atoms of 
a substance are converted into the atoms of another substance and 
considerable quantities of energy are released. Observations of com¬ 
paratively large masses of a radioactive substance show that the va¬ 
rious atoms disintegrate independently of one another so that the 
number of disintegrations of atoms in nonoverlapping intervals of 
time are mutually independent. Besides, the probabilities that du¬ 
ring a time interval of definite length there will occur a certain num¬ 
ber of disintegrations depend on the length of the interval and are 
practically independent of its location in time. Actually, of course, 
the radioactivity of the substance gradually falls off as its mass di¬ 
minishes in size. However, for comparatively small time intervals 
(and for quantities of substance not excessively great) this change is 
so insignificant that it can definitely be neglected. 

Any number of instances may be given in which the natural pheno¬ 
menon or technological process that interests us is a homogeneous 
process with independent increments. A few examples are: cosmic 
radiation (the number of cosmic-ray particles impinging on a definite 
area during a specific time interval), the breaking of yarn on a ring 
spinning frame, the working load of a telephone operator (the number 
of calls in a certain interval of time) and so forth. 

Let us now clarify the characteristic property of homogeneous 
stochastic processes with independent increments. 



320 


Chap. 10. The Theory of Stochastic Processes 


Denote by F (Xy t) the distribution function of the increment in 
the variable | (0 during a time interval of duration t. Then, if the time 
Intervals of duration Ti and Tg do not overlap, we have 

F(x\ Ti + Tg) = 5^(;c— x^)dyF(yy x^) (1) 

If f {Zy x) is the characteristic function, that is if 

/( 2 , T) = \e^^^d^F {x\ x) 


it follows that Equation (1) will, in terms of characteristic func¬ 
tions, take on the following form: 

f(z; X, + Tj)=/(Z, T,)/(2, Tj) (1') 


Generally speaking, if the time intervals Xj, 
overlap, then 


f .S 


n 


k=l 


= n^(z: t*) 

fe=l 




..., x„ do not 


In particular, if Xi = X 2 =..,=x„ and — 


k=i 


X)= /(z; “ 


then 


Thus, the distribution function of any homogeneous stochastic pro¬ 
cess with independent increments is infinitely divisible. 

It should be pointed out that it was through the study of homoge¬ 
neous processes with independent increments that infinitely divisible 
distribution laws began to be studied in the theory of probability 
We have seen that the theory of infinitely divisible distribution laws 
exerted a decisive effect on the development of the classical problems 
of probability theory concerning the summation of random variables. 
As we have already pointed out, whereas before the interests of in¬ 
vestigators centred on determining the broadest possible conditions 
for the law of large numbers and the convergence of normalized sums 
to the normal law, after A. N. Kolmogorov fully characterized the 
class of laws that govern homogeneous stochastic processes without 
aftereffect, it was quite natural for those general problems to emerge 
that were considered in the preceding chapter. And here it was found 
that the basic distribution laws, which earlier were obtained as asym¬ 
ptotic laws, in the theory of stochastic processes play the role of exact 
solutions of the appropriate functional equations. More than that, 
this new point of view permitted clarifying the causes by virtue of 
which in the classical theory of probability only two limiting distri¬ 
bution functions—the normal law and the Poisson law—were consid¬ 
ered. 



Sec. 56. Homogeneous Stochastic Processes with Independent Increments 


321 


Since for arbitrary t > 0 in homogeneous processes with inde¬ 
pendent increments 

/(z; t) = [/(«, 1)]’ 

it follows that such processes are fully determined by specification 
of the characteristic function of the variable | (1)—1(0). In Sec. 45 
we saw that, for infinitely divisible laws with finite variance, 

log/( 2 ; = +—1—iztt} (2) 

where y is a real constant and G{u) is a nondecreasing function 
of bounded variation. We shall consider only this special case of 
homogeneous processes. 

We introduce the following notations into formula (2): 

u 

M{u)= J ^dG (x) for « < 0 

— 00 
CO 

N {u) — — ^^dG{x) for a>0 

u 

a2 = G(+0)—G(—0) 


Then it will take the form 

0 

log / ( 2 , 1) = iyz — 1 — izu) dM («' 

— 00 

00 

— izu}dN{u) (2') 

0 

Let us now clarify the probabilistic meaning of the functions 
M {u) and N {u). 

In deriving the formulas of the canonical representation of in¬ 
finitely divisible laws in Sec. 45, we introduced the function 

u 

G„(«) = n \x'^d(S)„{x) 

— QO 

We put 

a 

(«) = J dG„ (X) = n<t>„ (u) for u < 0 


N„ (u) = - (X) = - n [1 -®„(«)] for « > 0 

U 


and 



322 


Chap. 10. The Theory of Stochastic Processes 


From the fact that as rt->oo at continuity points of the function G{u) 

iu)~*‘G (u) 

we conclude from Helly’s second theorem that at the continuity 
points of the function M (u) 

Mn {u)=nQ>J^ {u)-^M (u) 

From the viewpoint of stochastic processes, 0„(a:) (a:<C0) is the 
probability that the variable |(t) will receive a negative increment 

greater than x in absolute value, in the interval of variation 

of the parameter t. Thus, Mn (x) is the sum over all k from 0 to m — 1 
of the probabilities that the variable |(0 will acquire a negative in¬ 
crement (in jumps that are in absolute value greater than x) in the 

interval of variation of the parameter x. Since M (u) and 

N{u) are the limits of the functions Mn(u) and N„{u) respectively 
as Ai->oo, they are referred to as jump functions. 

If M (u)^0 (for M<0) and N (u)^0 (for «>0), that is, there are 
no jump functions, then it will be seen from formula (2') that in this 
case the stochastic process is governed by the normal law. We see 
that a stochastic process governed by the normal law is continuous 
in the meaning of probability theory. We shall now prove a stronger 
assertion. 

Theorem. For a homogeneous stochastic process with independent 
increments and finite variance* to be governed by the normal law**, 
it is necessary and sufficient that for arbitrary e>>0 the probabi¬ 
lity that the maximun of the absolute value of increments in ^ (x) 

during the intervals f J {k= 1, 2, ..., n) will exceed e 

should tend to zero together with —***. 

Proof. We have just seen that a homogeneous stochastic process 
with independent increments is governed by the normal law if 
and only if, for a: > 0, 


Since 


M(— x) = Nix) = 0 


M (u) — limM„{u) and N {u)~ lim N„ (u) 

rt -*■ 00 n -*■ <*3 


(3) 


* The theorem is true even without assuming the variance to be finite. 

** In particular, by the normal law with variance zero, which is a law like 
(;ic) = 0 for X < a, F (x) — \ ioi x > a. 

*** Thus, only processes governed by the normal law are “uniformly conti¬ 
nuous” in the probabilistic sense. 



Sec. 57. The Concept of a Stationary Stochastic Process 


323 


it follows that the condition (3) is equivalent to the following: 

lim (— w) = lim /I [ 1 — (D„ (u)] = 0 (4) 

n -* XD /I -*• 00 

We denote the increment |(t) in the interval by 

then 

= <1>« (— ^) + 1 + 0) = P {| Ink I > 4 

It is obvious that the relations (4) are equivalent to the following: 

hm ^Pnk=^ 

n ■* «> k= 1 

From the inequalities 

n 

n n -'E Pnk 

ft=i k=i 

we see that the relations (4) are equivalent to the assertion that 

lim n (1—p„*)= 1 

n -*■ a> k= 1 

which means that the probability that the inequalities ||„;^|<8 
will be realized for all k{l as n—>oo, tends to unity. 

To put it otherwise, we have proved that the relations (3) hold 
if and only if, as n—>oo, 

P{ max \lnk\>^\-^^ 

I < /e < n 

which is what we set out to prove. 


Sec. 57. The Concept of a SUttionary Stochastic Process. 

Khinchin's Theorem on the Correlation Coefficient 

The Markovian processes, or processes without aftereffect, that 
we have studied in the preceding sections do not by any means exhaust 
the demands made by the natural sciences on probability theory. 
Indeed, in many cases earlier states of a system exert an extremely 
strong effect on the probability of its future states and this effect of 
the past cannot be dismissed even in an approximate interpretation of 
the problem. In principle, the situation can be corrected by changing 
the concept of the state of a system via the introduction of new para¬ 
meters. For instance, if we considered a change in the position of a 
particle in diffusion processes or the Brownian motion as a process 
without aftereffect, this would mean that we disregard the inertia of 
the particle, which quite naturally plays an essential role in these 



324 


Chap. 10. The Theory of Stochastic Processes 


phenomena. In our example, the situation might be rectified by 
introducing the velocity of the particle into the concept of the state 
of the particle in addition to its coordinates. However, there are cases 
where this does not facilitate the solution of the problems. First of 
all, this refers to statistical mechanics, in which indication of the po¬ 
sition of a point in one or another cell of phase space only yields a pro¬ 
babilistic judgement about its future state. Here, a knowledge of 
earlier positions of the point alters very essentially our judgements 
about the future of the point. In this connection, A. Ya. Khinchin 
isolated an important class of stochastic processes with aftereffect, 
the so-called stationary processes, which behave homogeneously in time. 

A stochastic process | [t) is called stationary if the n-dimensional 
distribution functions of the probabilities for two finite groups of 
variables | (4), I (4), • • •, i (4) and I {h+u), ^ (4+«), • • •, ? (4+«) 
coincide and, hence, are independent of u. The numbers n and w, 
and also the instants of time ti, U, -. tn may be chosen here quite 
arbitrarily. 

If we introduce the notation 


F (Xj, JCgt • • • ? • • • > ^n) 

= P{|(^i)<a:i, l(t,)<x^, ..., l(t„)<xj 

then in accordance with the foregoing definition the following 
equation holds for any u and n: 

F (Xj , X^, • • • » ~{~ tL, t^ -|- tly > • . , tf^ -f“ U) = 

= F {x^, x^, • • •» x,^, /g, •.., tfj) (I) 

The distribution functions F(x^, x^, ..., x^; t^, t^, ..., /„) of any 
stochastic process must obviously satisfy the following two condi¬ 
tions: 

(I) the symmetry condition: the equation 
F Xi^, • • • t * ^in)~ 

' F (•^i» ^2, • • • » Xfj, f j,- ^2, . . . , tfj) 


holds for any permutation i^, /j, ..., of the numbers 1,2, ..., n\ 
(2) the compatibility condition: if m<^n, then for any 


FCXj, X 


2 » 


F(Xi, X 


2 > 


^l» ^2» • • •» 

•» • • •» ^a» 


^n) (2) 


During recent years, the theory of stationary processes has found 
considerable applications in physics and engineering. 

Stationary processes have been found to lie at the root of certain 
acoustic phenomena, including the random noises of radio engineering, 
and also of the search for hidden periodicities in astronomy, geophy¬ 
sics, and meteorology. 



Sec. 57. The Concept of a Stationary Stochastic Process 


325 


Steady-state technological processes frequently exhibit phenomena 
of the nature of stationary processes. As an illustration, consider the 
process of spinning. An appreciable inhomogeneity of the properties 
of spinning materials (length, strength and cross-section, etc., of the 
fibre), fluctuations in the rate and uniformity of feed of the product 
to the machines during various stages of the spinning process, and 
many other factors result in the properties of the yarn varying from 
one cross-sectional point to another. And it turns out that a knowledge 
of one or another property of the yarn in some part of the skein does 
not yield a complete knowledge of its properties in any other por¬ 
tion. But since the spinning process may be. regarded as a steady- 
state process, the probabilistic characteristics of the quality of the 
yarn constitute a stationary process. 

Clearly, any numerical characteristic of a stationary process I (t) 
is independent of the time t and , for example, if I (t) has a finite va¬ 
riance, then, obviously, the following equations hold: 

M|(/ + w)=Mi(0 = M|(0) = a 
DE(^ + w) = D^(0=D^(0)=a2 
!A{l(t + u)l{t)} = m{l(u)l{0)} 

This circumstance enables us, without restricting the generality 
of subsequent results, to assume a = 0 and a= 1 [to achieve this 

It is obviously sufficient to consider the ratio — in place of 

mi 

To take an important example, we consider a normal stationary 
process. For any n(n=l, 2, ...) let the vector S„ = {|(^i), 

..., |(^„)} be normally distributed. We assume that 

M|(/y) = 0, i~oo<t^<oo) 

and suppose that 

where /?(0)=1 and R(i) is an even function of t. The function 
R{t) is such that the quadratic form 


is positive definite. 

Since the characteristic function of the vector S„ is 


/ n (^l» ^2» * 


» ^2* 


tn) == exp 


/si 




326 


Chap. 10. The Theory of Stochastic Processes 


and the characteristic function of the vector = 
is, for any k <. n, equal to 

/ k k 

fk (^l» • • • » ^l» • • • > GXp ( 2 ^ (^i ^ j) ttittj 

“ /« (^1 » • • • » t), • • • , 0, tly • • • , t/f + l'y • • • > ^n) 

we conclude that the stochastic process we have defined satisfies a 
self-compatibility* condition. Besides, it is directly obvious that 
this process is stationary. 

A homogeneous Markov process, that is, a Markov process for which 
the transition probability F {t, x] t, y) is a function solely of the three 
arguments x, y and t —/, is also stationary. 

In many problems of theory and in applications, multidimensional 
distributions of type (1) are not considered; and the only use made 
of the stationarity of the process is the constancy of the expectation, 
the variance, and the dependence of the correlation coefficient solely 
on the difference of the values of the parameter t. It is therefore na¬ 
tural to generalize the definition of stationarity and say that a sto- 
ckttstic process is stationary in the broad sense if the expectation and va¬ 
riance of I (t) are independent of t, and the correlation coefficient of 
l{t) and l{t-\-u) is a function of u alone. 

Clearly, a process |(/) cannot be determined from a knowledge 
only of second moments and, consequently, such knowledge cannot 
completely take the place of the theory of stochastic processes based 
on a consideration of probability distributions. Nevertheless, in many 
problems the theory, which is based solely on a consideration of se¬ 
cond moments (or, correlation theory, as it is called), proves sufficient 
and yields satisfactory results. 

In this section we confine ourselves to a study of the correlation 
function, that is, the correlation coefficient of and |(/ + w): 

o A _ M [| (^ + ^) - 4- M)] M [j (0 - M| it)] 

^ ^ V0l{t)Dlit+u) 

By virtue of our assumption that fl = 0 and a = 1, the expression 
for R (u) is simplified to 

(tt) = M {!(«) I(0)} 

In correlation theory a stationary stochastic process is called 
continuous if 



* By this term is meant the compatibility of all distribution functions of 
the process. 



Sec. 57. The Concept of a Stationary Stochastic Process 


327 


as w —^0. As follows from Chebyshev’s inequality, for a continuous 
process, given any 8>0 and arbitrary /, we have in particular 
the following relation: 

As follows from the equation 

iVl(|(^+w)-|(0)^ = 2(l-/?(w)) 
for a continuous stationary process the relation 

lim R (u) = 1 

u -*0 

holds. 

In the case of a continuous stationary process, R{u) is a con¬ 
tinuous function of u. In fact, 

I/? (i, 4_ (a) I = I M (tt + Att) I (0)} -M {E («) I (0)H = 

= |M{i(0)[|(w + Au)-|(w)]}| 

But by the Cauchy-Bunyakovsky-Schwarz inequality, 

IM {I (0) [I (u + Au)-l (u)]} 1 < KM|»(0)M[|(« + A«)-|{a)]> 
And since 

(0) = 1 

and for a continuous process, as Aw—>0, 

it follows that, as Aa— j-O, | i? (u + Aw) —/? (a) | also tends to 0. 
This inequality proves our assertion. 

In the theorem that will now be proved, stationarity may be 
understood both in the broad meaning and in the narrow sense. 

Khinchin’s Theorem. For a function R (w) to be the correlation 
function of a continuous stationary process, it is necessary and 
sufficient that it be representable in the form 

(w) = J cos ux dF (x) (3) 

where F (x) is some distribution function. 

Proof. The condition of the theorem is necessary. Indeed, if 
R{u) is the correlation function of a continuous stationary process, 
then it is continuous and bounded. Let us prove, in addition, 
that it is positive definite. Indeed, no matter what the real num¬ 
bers Wi, « 2 * •••> ^he complex numbers t]i, Tig, r\„ and 



328 


Chap. 10. The Theory of Stochastic Processes 


the integer n, the following relation holds: 

0 < M 2 riji (Uj = MI S 2 TijriyI (U/) |(U/)} = 

*=1 /=! ) 

n n 

= 2 S 

7=1f=l 

By the Bochner-Khinchin theorem (Sec. 39), it follows from this 
that R{u) may be represented as 

R(u)=^ {x) 

where F (x) is a nondecreasing function of bounded variation. 
Whence, by virtue of the fact that the function Riu) is real, we 
get 

R{u)= ^ cos ux dF (x) * 

Finally, taking into account the condition of continuity of the process, 

i?(+0)=l 

we find that 

F( + oo) — F( — od) — 1 

or that F{x) is some distribution function. 

The condition is sufficient. We are given that R{u) is a function 
of the form (3). We have to prove that there exists a stationary pro¬ 
cess 1(0 whose correlation function is the function R{u). For this 
purpose, for every integral n ajid every group of real numbers 0, 
4, ..., 4 we consider the n-dimensional vector 

1(4), 1(4), ...,E(4) 

which is normally distributed and has the properties 

m|(4)=mu4) = -..=m^(4)=:0 
D|(4)=DU4)=... = D|(y = i 

For any i and / the correlation coefficient of 1(4) and ^(4) is 
equal to /?(/,•— tj), that is, 

The form of the function R{u) ensures the positive definiteness 
of the quadratic form in the exponent of the ^-dimensional normal 
law. The norma/ stochastic process thus defined is stationary both 
in the strict and the broad sense of the word. 

*In consequence of the result of Example 3, Sec. 36, the function f (x) is 
symmetric, that is, 


f (x-f 0)^=1—/'t— X) 



Sec. 57. The Concept of a Stationary Stochastic Process 


329 


This theorem plays a fundamental role in the theory of station¬ 
ary processes and in its applications in physics. For details, the 
reader should study the specialized literature. 

Example 1. Let 

^ (^) = I cos ^^ + 11 sin "kt 

where | and t] are uncorrelated * random variables for which 
M|=Mt] = 0, D|=Dii= 1, and ^ is a constant. 

Since 

— M [i cos X + ■»! sin X (/ + u)] • [I cos kt + t] sin kt] = 

= M cos kt • cos k{t (sin k {t -f w) • cos kt + 

-|- cos k (^-|- u) sin kt) -\- sin kt sin k(t-{- u)] — 

= cos kt • cos k{t-\-u)-\- sin kt • sin k{t-\-u)= cos ku 

it follows that the process %{t) is stationary in the broad sense. 
For this case, we have to put, in formula (3), 

{ 0 for X ^— k 

\I2 for —k<ix^k 
1 for X > A. 

Example 2. Let 

1 

n 

where (0 = i* cos kjJ; -j- sin kffy kf^ are constants, 2 ^ 

^ " 1 

and the random variables and % satisfy the following condi¬ 
tions: 

M^ft=M% = 0, Dift = DTi*=l (^=1,2, 

for 

iVl|/Tiy = 0 for i, /=1, ..., n 

It is easy to compute that the correlation function for |(^) is 
equal to 

R{u)— 2 H. cos kf^ u 

fe=i 

and that, consequently, the process is a stationary process in 
the broad sense of the word. The function F (x) in formula (3) 

* The random variables | and q are termed uncorrelated if = 



330 


Chap. 10. The Theory of Stochastic Processes 


only increases at the points ^nd has jumps of magnitude ^ b% 
at these points. 

Stochastic processes for which the function F{x) increases by 
jumps alone are called processes with discrete spectra. 

It is easy to see that any process of the type 

l(0=S64»(0 (4) 

00 

where oo and have the same meaning as in Example 2 

1 

is stationary in the broad sense and has a discrete spectrum. It 
is important to point out that E. E. Slutsky found a profound 
converse proposition: Any stationary process with a discrete spectrum 
is representable in the form of (4). Generalization of Slutsky’s 
theorem to the case of an arbitrary spectrum will be formulated 
in the next section. 

In parallel with the theory of stationary processes, there de¬ 
veloped a theory of stationary sequences. A sequence of random 
variables 

.... I_2» i-i» lo» ii» la* • • • ( 5 ) 


is called stationary if for any integral n, u and tj (1^/^n) 
condition (1) is satisfied. Similarly, sequence (5) is called station¬ 
ary in the broad sense if for all terms of the sequence the expec¬ 
tations and variances are constant numbers that are independent 
of their place in the sequence 


... =Mio=Mii=Mi3=... 

... =D^_2= D|_1= D|o= Dii= ... = 


and the correlation coefficient of h and Ij is a function solely of 
1 / j j. 

By way of an exercise, we suggest that the reader prove the 
theorem: if for a stationary sequence 

lim R(s) = 0 


where R{s) is the correlation coefficient of h and 1;+^, then the 
law of large numbers applies; that is, as n-^oo. 


/ 


P 


V 




no matter what the constant e > 0. 


1 



331 


Sec. 58. The Concept of a Stochastic Integral 

Sec. 58. The Concept of a Stochastic Integral. 

The Spectral Decomposition of Stationary Processes 

For'what follows we have to introduce the concept of a stochastic 
integral. Let a stochastic process and a numerical function f{t) 
be given on the interval a^t^b. Partition the interval [a, b] 
by points a= <... < = 6 and consider the sum 

If as max (f;—f,_i)-*-0 this sum tends to a certain limit (which, 

1 < f < « 

generally speaking, is a random variable), then the limit is called 
the integral of the stochastic process %{t) and is denoted by the 
symbol 

b 

I=\f(t)l(t)dt 

a 

The improper integral (for a= — oo, b—oo) is defined in the 
usual manner as the limit of proper integrals as a-► — oo, b-^ oo. 

The convergence of the integral sums /„ is to be understood in 
the following sense: there exists a random variable / such that 
as 71 oo 

(1) 

Proceeding from familiar theorems in the theory of functions of a 
real variable, it is easy to prove that the sequence of random va¬ 
riables /„ converges to the limit I in the sense of (1) if and 
only if 

M (/„-/„)*-0 (2) 

as min(772, ti)—► oo. We shall not dwell on the proof of this fact. 
Theorem 1. For the integral 

b 

a 

to exist it is sufficient that the integral 

b b 

A= 5 5 R{t—s)f{t)f{s)dsdt 

a a 

should exist. Moreover, 

- b 

^ = M \f(t)l(t)dt 

-O 




332 


Chap. 10. The Theory of Stochastic Processes 


Proof. To prove the first half of the theorem it will suffice to 
notice that if the integral A exists, then the relation (2) holds. 
And we have 




2 / (i,)!(/,) 


n m 


-2M 2 2 / iti) f S;) I (t,) 5 (S/) M, Asy+ 

/=1 

1 2 


+ M 


m 


2 f(S/)l(S/) 


L/=i 


n n 


= 22 / «.) / i^k) R (t,-tk) At, - 

itsi k=ii 

n m 

-22 2/«()/(S;)/?«(-*/)A<,As^ + 


i=l/=l 


m m 


+ 22/ (S/) / (''i) R (S/—O'*) As, Aa* 

/=i k=i 

Here, the numerical values of and and also S/^ and coincide. 
By virtue of the assumption that the integral A exists, 

^=lini 2 2 /(i,)/{T,)/?(<,-i*)A<,AT,= 

iz=ik=l 

n m 

= 2 2 / (^/) f (s/) ^ (^—s/) 


/=i/=i 


n m 


= lim 2 2/(s/)/('’*)^(S/—o»)AS;Ao, 

/=! *=1 


SO long as max [A^,-, Asy] —>0. Thus, as min(m, n)-^ oo, 

M (/,-/„)«-^0 

To prove the second part of the theorem, note that 


M 


n 


T2 


2/«,)!(/,) A/; 

i=J 


n n 


= M 2 2/ iU) f i'^j) I iU) l (V AXy 


/I « 


= 2 2 / (<y) / (■^z) t;) A<, At^ 


As max A^/—»- 0, the last sum tends to the integral A. 

1 ^ /I 

In addition to the notion of a stochastic integral that has just 
been introduced, we can also consider a stochastic Stieltjes integral, 
which we define as the limit of the sums 


n 


2/«*)[l(y-i(4-i)] 

kml 


( 3 ) 



Sec. 58. The Concept of a Stochastic Integral 


333 


as max(/,-—^0. As before, a — and the 

limit is to be understood in the sense of (1). If the limit of the 
sums (3) exists, we will denote it by the symbol 

b 

S/(0dl(0 

a 

At the conclusion of Sec. 57 we formulated Slutsky’s theorem, 
which expresses the relationship between stationary processes with 
discrete spectra and Fourier series with random uncorrelated 
coefficients. It may be proved that the following property holds 
for every stationary process (in the broad sense): For any 8>0 
and arbitrarily large T, there exist pairwise uncorrelated random 
variables |i, ..., rig* •••» iln numbers 

•••* swc/i that for any t of the interval — T-^t^T, the 
inequality 

r n -12 

M 1(0—2 ilk cos Xf^t + sin X^i) < ^ 

*=i 

holds. From this, in particular, it follows that under the given 
conditions 

r n 

P- i(0 —2(^*cosV + %sinV) >T1<^ 

{ k=i ) ' 

where t] is a preassigned positive number. 

The following important theorem is given without proof. 

Theorem 2. Any stochastic process that is stationary in the broad 
sense is representable in the form 

00 00 

1(0= J cos Xt dZ^ (>,) + J sin Xt dZ^ (X) (4) 

0 0 

where the stochastic processes Z, (^) and Z^(X) (the variable X^O) 
possess the following properties: 

(a) M [Zf (X.J-}-AA<i)—Z,-(^i)] [^/(^ 2 “h 2/(^a)l = 0 (*» /”^» 2) 

if i=^i and if the intervals (Xj, and (^g, \ + AXa) are 

f 

nonoverlappingy then i and j may be equal; 

(b) M [Zi {XA-^X)~Z, {X)]\= M [Z, (^-i- AX)-Z, 

* The numbers rt and Xj, Xa, X„ and also the variables and tj/ are 
dependent on e and T. 



334 


Chap. 10. The Theory of Stochastic Processes 


It is natural to call formula (4) the spectral decomposition of 
the process |(^). 

The stochastic processes (X) and (k) of formula. (4) may be 
determined by the equalities 

T 

Z.M= lim i 

T 00 y, 

and 

Z,(X) = Hm J 

It is easy to prove that both integrals exist (this is done by means 
of Khinchin’s formula that was proved in Sec. 57). It will also be 
seen that 

F {k-\-^'k)—F {k)= M [Zi (k + AX)—Z^ (k)Y 

where the function F (X) is determined by Khinchin’s theorem. 

The possibility of decomposing into the form (4) an arbitrary sto¬ 
chastic process which is stationary in the broad sense was pointed 
out in 1940 by A. N. Kolmogorov, who stated the result in terms of 
Hilbert space geometry and proved it by means of the spectral theory 
of operators. Since then, many authors, such as Cramer, Karhunen, 
Loeve, Blanc-Lapierre and others, have contributed to the probabi¬ 
listic interpretation and derivation of this decomposition. 

We shall not speak here of the applications of the spectral decompo¬ 
sition to problems in the theory of oscillations and geophysics; we 
refer the reader to the works of A. M. Yaglom, Blanc-Lapierre and 
Fortet, which are given in the bibliography at the end of the book. 


Sec. 59. The Birkhoff-Khinchin Ergodic Theorem 

In 1931 the American mathematician George Birkhoff proved a ge¬ 
neral theorem of mechanics, which, as A. Ya. Khinchin demonstrated 
three years later, allows for a broad probabilistic generalization. The 
theorem is as follows: if a continuous stationary process i (^) has a finite 
expectation, then with probability one there exists the limit 

T 

lim -i’ f ^ 
r ^ J 
0 

The stationarity of the process is here understood in the strict and 
not broad sense. 



Sec. 59. The Birkhoff-Khinchin Ergodic Theorem 


335 


Since this theorem is a peculiar form of the strong law of large 
numbers, we will prove it for stationary sequences (not for station¬ 
ary processes) so as to continue directly the formulations of 
Chapter 6. 

Theorem, For a stationary sequence of random variables 

• • • » 1» ^o» • * ' 

for which is finitCy the sequence of arithmetic means 



t' = i 


converges to a limit with probability one. 

Proof. We denote 

L la + la+i+• • •+i6-l 

We have to prove that with probability one the quantities h^j, tend 
to a limit as b-^oo. Denote the random ^ent that this limit exists 

by the letter We must prove that P{K)=l or, which is the same 
thing, that P{K)=0. 

We assume the contrary, that the event K (that is, that the quanti¬ 
ties /io6 do not converge to a limit as b-^oo) has a positive probability 
and we will demonstrate that this assumption leads to a contradiction. 

For this purpose we consider all intervals (a^, p„) with rational end¬ 
points, a„<p„. The set of all such intervals is countable. If lim/io6 

does not exist, there will bean interval (a„, p^) for which lim sup/io6> 

b-*-oo 

>p„ and lim inf/io6<an (event Kn)- Thus, the event/( decompo- 

&-»-eo 

ses into a countable set of mutually exclusive cases Kn- Since by as¬ 
sumption P(/C)>>“0, an n can be found such that P (/(„,)> 0- 
It is thus proven that if P(/0>0. there exist two numbers a and 
p(a<P) for which the following inequalities hold simultaneously: 

lim sup/io6 > P 1 

lim mihfyj,<a } 

Now suppose that all Ij have taken on certain definite values. If 
the interval (a, b) is such that /ia6>P, but for all b\ for which b< 
<,b'<.by P» then this interval will be called special (with res¬ 

pect to P). 

It will readily be seen that two special intervals do not overlap. 
Indeed, if two special intervals (a, b) and {flu hi) are such that a< 



336 


Chap. 10. The Theory of Stochastic Processes 


<.ai<.b<.bu then from the equality 

(ai~a)haai + il>—ai)haib 
- —^a 

and the inequality there follows either /Zaai>P or /iaife>P* 

However, the first of these inequalities is impossible because the in¬ 
terval (a, b) is special, the second is also impossible since the interval 
(au bi) is special. 

The difference b — a will be called the rank (length) of the interval 
(a, b). If an interval (a, b) is special, is of a rank exceeding s, and is 
not contained in any interval of a rank exceeding s, then such an in¬ 
terval will* be called sspecial. 

Since from among special intervals containing an arbitrary interval 
(a, P) of length not exceeding s and also having a length that does 
not exceed s, there should be at least one of greatest length, if there 
were two they would overlap, which is impossible on the basis of 
what has already been proved. Thus, every special interval of length 
not exceeding s may lie inside only one s-special (or coincide with 
it). From the definition it follows that two s-special intervals can only 
lie one outside the other. 

We denote by Ks the event that inequalities (1) are valid and, be¬ 
sides, that there exists a ^^s such that /iof>P- Since K is the limit 
for the events Ks, 

P(K)=limPiKs) 

s -*■ m 

From this it follows that for all sufficiently large s the inequality 
P{Ks)'>0 is valid. From now on we will confine ourselves solely.to 
such values of s. 

Let the event Ks take place. Then among the values of those s 
for which /lof^P there exists a least t\ The interval (0, t') is special. 
Consequently, it lies in some s-special interval (a, b) (or is such itself), 
for which a^0<,b. The converse is also true: if there exists an 
s-special interval {a, b) for which 0<6, then there exists a t^s 
such that ^ot^P- For a=0 this is obvious: it suffices to put t=b. 
But if a<0, then from the equality 

u _ —ahao+bhob 
^ab - ^ 

and the inequalities hab>^, ^ao^P there follows > P* Thus, 
in this case too it is possible to set t = b. 

We denote — a by p and b—a by q. Since the s-special interval 
(—P, —P + ^) can only exist alone, the event Ks is decomposed 
into the mutually exclusive cases Kpq, which correspond to the 
existence of s-special intervals (— p, — p + q): 

Ks 4^ Kpq (^~I» •••fSfP = 0f 1, .., f q 1) 

P.Q 



337 


Sec. 59. The Birkhoff-Khinchin Ergodic Theorem 

Changing the numbering of the sequence + p transforms the 

case /Co^ into the case Therefore, by virtue of stationarity 

P (K,,) = P (Ko,) and Since 

p (K.) = 2 p (/c„) Mio/A:„=2 ^ iK.,)!, m, k,, = 

P, Q <7 P 

=2P(/C„,)M?h„,/Ko, 

Q 

we find, by taking into account that in the event of K^q the ine¬ 
quality h^g > p holds, that 

P (Ks) Mko/Ks > 2 P (K.,) <7P = P 2 P iK,,) = PP (K,) 

Q P. <J 

Whence, since by assumption P(Ks)¥=0, it follows that 

M|o//C,>P 

Since Ks—*K, we have 

ME„//C>P 


In similar fashion (if special intervals were considered with re¬ 
spect to a) it is possible to prove that 

Mlo/ZC^a 

This is a contradiction. And so it follows that P(/<)=^0, which 
is what we set out to prove. 

To investigjate the limit to which the quantities /io„ tend as 
n —► oo requires additional arguments. We confine ourselves here 
to the proof of the following theorem. 


Theorem. If the random variables are stationary, have finite 
variance and the correlation function P(k)—>^0 as k—*oo, then 


P / h 


V On 
I n -► OB 


a 1 = 1 (a=M|*) 


Proof. Consider the variance of the quantity h^n- By virtue of 
stationarity we have 

" 1 * 


D/io„ = M 


=^\n+2 X 

* = I J L l<f</<n 


It is obvious that 


n-l 


2 P(/-i)= 2 («-*)P(*) 

l</</<n *=I 


* Note that only at this point have we made use of the assumption of sta¬ 
tionarity. 



338 


Chap. 10. The Theory of Stochastic Processes 


Consider an m so large that for ^ > m we have the inequality 

\R(k)\^e (6>0) 

From this it follows that 

[ m n-l 

n+2'^ {n—k)R(k) + 2^ ^ («—*) 

fe=l *=m+l 

This inequality is obviously strengthened as follows: 

[n + 2m(n —1)+ e(n—m— l){n—m)] 

From this it is clear that if n is sufficiently great, the right side of 
this inequality may be made less than 3e. Thus, as n-^oo the quan¬ 
tities hon converge in probability to a, and since /ion converge with 
probability one, as n tends to infinity, the assertion of the theorem 
is obviously correct. 

The above-proved theorem is not only of considerable theoretical 
interest, but finds extensive applications in statistical physics and 
in engineering practice as well. The reason is that to determine such 
important characteristics of a phenomenon as ]Vl|(^), D|(/), R{u) 
in the case of stationary processes, one does not need to know the pro¬ 
bability distribution of the possible values or to calculate these 
quantities from appropriate formulas. The determination of these spa- 
tial averages, to use the physics term, demands of the investigator 
information that is often lacking. At any rate, the experimental esti¬ 
mation, of these quantities requires repeated realization of trials 
for the process [that is, numerous realizations of the function ^(f) 
have to be obtained from experiment!. The Birkhoff-Khinchin ergodic 
theorem shows that it is possible, with probability one, to confine 
oneself (under specific conditions) to a single realization of the pro¬ 
cess l{t). 



Elements of Queueing Theory 


Sec. 60. A General Description of the Problems 
of the Theory 

Of the numerous and profound applications of the theory of sto¬ 
chastic processes to various problems of physics, biology, engineering 
and economics we consider here only one, which in recent years has 
seen considerable development under the impetus of the diversified 
demands of practice. Originally, the specific problems that lead to the 
theory of mass-scale service, as it is termed in the Soviet literature 
(or of the queueing process), emerged in connection with the operation 
of telephone systems. Later it was found that similar problems arise 
in merchandizing (computing the number of shops, salesmen, cash- 
registers, supplies, etc.), in the operation of production equipment, 
in calculating the traffic capacity of roads, bridges, crossings, aerod¬ 
romes, canal locks, seaports, and so forth. A. K. Erlang, a Danish 
scientist engaged for many years in the Copenhagen Telephone Com¬ 
pany, played the basic role in formulating and solving the first math¬ 
ematical problems of this nature. Today they constitute but a small 
fraction of the problems of this nature that have been elaborated. 
Interest in queueing theory shot up throughout the world and the num¬ 
ber of theoretical and applied studies in this field far exceeds a thou¬ 
sand. 

To get an idea of the peculiarities of stating problems in queueing 
theory, we shall first examine a few problems of an applied nature 
and remain on a purely qualitative level. We will then take one prob¬ 
lem (given the most elementary assumptions) and will study the clas¬ 
sical methods of Erlang. The sequel will be clear from the table of con¬ 
tents. 

Suppose that a 'telephone exchange receives calls from its subscri¬ 
bers. If at the time of arrival of a call there are open lines, the subscri¬ 
ber is^witched to one of them and a conversation is started that lasts as 
long as is necessary. If all lines are engaged, various systems of servi¬ 
cing the subscriber are possible. At the present time, two systems have 
been worked out in detail: a waiting system and a system involving 



340 


Chap. 11. Elements of Queueing Theory 


losses. In the former, a call that arrives at the exchange when all 
lines are busy is put in a waiting line and is made to wait until all 
the calls that arrived earlier have been completed. In the latter sys¬ 
tem, a call arriving at the exchange when all lines are engaged gets 
a refusal, and all subsequent servicing proceeds as if that call had 
not arrived (we say that this is “loss of a call”). 

For us at this point it is important to stress two peculiarities that 
must be taken into account when considering the problems that arise. 
The first is that the calls arrive at the telephone exchange at random 
instants of time and it is not possible to predict in advance when the 
next call will be made. Similarly, the duration of talks is not a con¬ 
stant factor and varies at random. Later on we shall return to a more 
detailed consideration of these two peculiarities that are found in 
all problems of queueing theory. 

The servicing systems involving waiting and losses do not only 
differ as to engineering aspects of the devices that handle them but 
also in the mathematical problems that emerge in their study. In¬ 
deed, to estimate the quality of servicing in the waiting system it is 
particularly essential to determine the mean waiting ^ime for the start 
of service, that is, the mean waiting (sojourn) time in a queue. For 
systems involving losses, the waiting time is of no interest at all, 
whether engineering or mathematical. Here the important thing is 
the probability of congestion (loss of a call). But whereas in the se¬ 
cond servicing system the probability of congestion affords a suffi¬ 
ciently complete picture of what may be expected in the given setup, 
the situation is more complicated in the waiting system. Despite 
its importance, the mean waiting time is not an exhaustive characte¬ 
ristic of the quality of service. Another very important factor is the 
spread of the waiting time about the mean. Also of interest is the dis¬ 
tribution of the length of the waiting line, the extent of the load on 
the service equipment, the distribution of duration of continuous 
operation of the equipment. 

The situation at a ticket office with a waiting line is similar to the 
system of servicing subscribers at a telephone exchange with queues. 
Some large factories have tool-supply depots. If such a depot services 
a large number of workmen, skilled workers find that they lose time 
waiting; if there are more depots, the supply clerks are idle much of 
the time. A similar problem arises in organizing the work of a seaport. 
Cargo ships do not arrive in port according to a fixed schedule, and 
loading and unloading times vary. If there is a lack of docking faci¬ 
lities, the ships spend appreciable times waiting, and this represents 
a loss, economically speaking. But surplus docking facilities increase 
the idle time of equipment and workmen. This gives rise to an impor¬ 
tant economic problem of determining the optimal range of docking 
facilities for handling a given freight turnover at minimal loss in¬ 
volved in maintenance of ships and handling equipment. 



Sec. 60. A GenercU Description of the Problems of the Theory 


341 


During the 1930s, in connection with expanding automatization 
of machine-tools in industry, there was a trend towards one operator 
servicing a number of machines. At random times, for various reasons, 
the machines go out of commission and require the attention of the 
workman (repairman). The duration of the repair operation is, gene¬ 
rally speaking, not constant and is a random variable. The question 
then arises: How great is the probability that at a given instant of 
time a certain number of machines‘will be waiting to be serviced? 
Important practical questions follow: What is the mean idling time 
of the machines in charge of one workman? For a given setup, how 
many machines can one workman handle most efficiently? 

Many problems of a scientific, manufacturing and economic nature 
involve more than just systems with waiting and losses. In everyday 
affairs we ourselves know how frequently one has to refuse service 
because of long waiting times. For example, in interurban (long¬ 
distance) telephone calls we often have to limit the waiting time and 
warn the operator that if the connection is not made within a speci¬ 
fied time, the call is to be cancelled. The problem is much the same 
in the sale of perishable food, in the organization of medical aid, the 
operation of a large airport, and so forth. 

It is therefore quite natural to formulate the following group ot 
related problems. A service system receives certain requests. If there 
is at least one free server (servicing device), the incoming request is 
handled immediately. If all the servers are busy, a fresh request gets 
in line: (a) if there are no more than a given number m of requests; 
(b) for a length of time not greater than x (this time is constant or 
depends on chance); (c) for as long as is necessary but is serviced 
during a time not greater than t (at the expiration of this time the 
request leaves the waiting line even if not completely serviced); 
(d) but in such manner that the sojourn time in the system (the total 
waiting time and servicing time) does not exceed x. 

In the foregoing problems, we proceeded on the assumption that 
the servicing devices (servers) were absolutely reliable and were con¬ 
stantly in operation. It is of course far removed from any actual si¬ 
tuation. There naturally arises the important problem of taking into 
account the effect of malfunctioning (breakdowns) of servers on the 
effectiveness of a service system. 

From now on we shall speak of demands made on servers for service. 
The totality of moments at which demands for service arise constitute 
a stochastic process. We will call this process the incoming flow of 
demands (incoming traffic). The incoming traffic may be described 
by the process k (t ), which signifies the number of demands that arrive 
between time 0 and time ^ Jn the overwhelming majority of papers 
dealing with queueing theory it is assumed that the incoming traffic 
constitutes a Poisson process (or, as it is sometimes called, an g/eme/i- 
tary flow) which was described in Sec. 51. There, the conditions were 



342 


Chap. 11. Elements of Queueing Theory 


given under which an elementary flow occurs. Later on, in Sec. 63, 
we will give other conditions that will ensure an elementary flow. 

The service time is a random variable with a certain distribution 
function H(x). It is very often considered, in investigations of both 
a theoretical and applied nature, that H {x)=0 for 0 and H (x)= 
= 1— e"''* for ;c>0, where v is a positive constant. This choice is not 
by chance but is due to a number of circumstances associated with 
simplicity of solution (this will be discussed later on). For the present 
we confine ourselves to the proof of an important property of expo¬ 
nential distribution. 

Theorem. If the service time has an exponential distribution: for 
x'>0 

H{x)=\~e-^^ ( 1 ) 

where v is a positive constant, then the distribution of the remaining part 
of the service time does not depend on how long the servicing has been 
in progress. 

Proof. Let ha{t) denote the probability that the servicing, which 
has already continued for a time a, will continue for at least a time t. 
From assumption (1) it is clear that 

h(i{a)=e~'^^ 

Since by the multiplication theorem 

hQ{a-\-f)=hQ{a)ha{t) 

it follows that 
From this we get 

K{t)=e-^^ 

and the proof is complete. 

To illustrate problems of queueing theory let us examine service 
with loss under conditions that were studied by Erlang. 

There are n servers to which there is an incoming flow of elementary 
demands. Every device (server) is accessible to any demand when it 
is free. Every demand is serviced by one server only, and every server 
serves only one demand (when it is busy). A demand that finds all 
servers busy servicing other demands is lost. Our problem consists 
in finding the probability of congestion. 

Let us consider the process of change of state of our system of ser¬ 
vice. At each instant of time it can be in one of the following states: 
Eq signifies that all servers are free, Ei, one server is busy, the re¬ 
maining are free, ..., E„ means that all servers (devices) are busy. 
Let us see what peculiarities this process has under the assumptions 
that we have made. 



Sec. 60. A General Description of the Problems of the Theory 


343 


At some time let our system be in state £ft. We shall prove that 
the subsequent course of the process is fully determined by this and 
does not depend on what occurred prior to time U. In other words, 
the process under consideration is a Markov process. Indeed, the sub¬ 
sequent course of the process is determined completely by the following 
three factors: 

(1) the times at which the servicings that are accomplished at U 
terminate; 

(2) the times at which new demands appear; 

(3) the duration of service of demands that appear in the system 
after U. 

By the peculiarity (just proven) of an exponential distribution, 
the durations of the remaining parts of service are not dependent 
on how long the service continued prior to 4- Since the traffic (flow 
of demands) is elementary, the past has no effect on how many de¬ 
mands arrive after U. Finally, the service time of demands appearing 
after U is in no way dependent on what was serviced (and how) prior 
to this moment. This shows that the process of variation of the system 
under study is Markovian. This circumstance is fundamental inas¬ 
much as it permits obtaining simple equations for those characteris¬ 
tics of the process that interest us. 

Denote by pk(t) the probability that at time t the system is in state 
£ft. Form the equations for the functions ph{t)> 

First we find the probability that at time t+h all the servers are 
free. This can occur ifi the following mutually exclusive ways: at 
time t all servers were free and during time h no new demands arrived; 
at time t one server was busy, servicing was terminated during the 
subsequent time interval h, and no new demands arrived; the remai¬ 
ning possibilities—two servers were busy and during time h they com¬ 
pleted their servicing, etc.— have probability of the order of o{h). 
The probability of the first event is 

Po(0^“^*=Po (0 (1—AA+o(/i)) 
the probability of the second event is 

Pi (0 1 = Pi (t) \h~\-o (h) 

Thus, 

Po pQ (t) (1 — Xh) + vhpi {t)-{-o{h) 

whence, by passing to the limit as h—^Oy we get the following 
equation: 

Po(t) = -Xp,(t) + ypi(t) ( 2 ) 

Reasoning in similar fashion for 1 ^ ^ < n we get 

pk{t) = Xpf^_i{t) — (X-\-kv)p^{t)-{-(k-\- l)'^Pfi+i(t) (3) 



344 


Chap. 11. Elements of Queueing Theory 


and for k—n 

Pnit)=‘^Pn-At) — nypnit) (4) 

The system we have obtained of linear differential equations 
permits us to find the desired functions pk{t). The arbitrary con¬ 
stants are determined by means of initial data, which we choose 
as follows: 

Po(0)=l. jc7*(0) = 0 for 1 

These equations mean that at the initial instant all servers were 
free. We add that the probabilities p«(0 must satisfy yet another 
additional normalizing condition: 

ip* 0=1 (5) 

fe=0 

Interest ofdinarily centres on studying a steady-state process, 
that is, the solution is considered as t —»>oo. As we shall see in the 
next section, under the conditions of our problem there are the 
limits 

Pk= Ite Pj(/) 

t a> 

and these limiting probabilities satisfy the following set of alge¬ 
braic equations obtained from (2)-(5) by substituting the constants 
pf^ for the functions pk{t) and zeros for the derivatives pk{tY 


—Xpo-{-vp, = 0 

'^Pk-i — Pk"^ (^+ 1) 'vPfe+i = 0 (I ^ ^ < n) 

I:p*=i 

k=0 


( 6 ) 


By putting Zk~^Pk-\—^"^Pk we reduce the set of our equa¬ 
tions to 

Zi = 0, z*—2ft+i = 0 for 2 „ = 0 

whence we find that 

kvpf,= Xpf,_.i, k= 1 , 2 , ..., n 
Simple transformations result in the equalities 

p*=-f^p. (*>1. P=t) 



Sec. 60. A General Description of the Problems of the Theory 


345 


Now (6) enables us to find the normalizing factor 


Finally, 




These formulas were found by Erlang and are called Erlang's for¬ 
mulas. For k=n we obtain the probability that all servers are busy 
and, consequently, the probability that every new demand (call) 
arriving in the system will be lost. Thus, the probability of rejec¬ 
tion (congestion) is 



To illustrate the rapidity of increase in the probability of losses 
with increasing p (the load per server), we give the following tables. 
Here we confine ourselves solely to the cases of n=2 and n=A 
and such values of p for which, on the average, every server has 
the same intensity of incoming demands. 


n = 


p 

0.1 

0.3 

0.5 

1.0 

2.0 

3.0 

4.0 

Pn 

0.0045 

0.0335 

0.0769 

0.2000 

0.4000 

0.5294 

0.6054 


n = A 


P 

0.2 0.6 l.O 2.0 4.0 6.0 8.0 

Pn 

0.0001 0.0030 0.0154 0.0952 0.3107 0.4696 0.5746 


Examining these tables we note that for small loads, an increase 
in the number of servers substantially reduces the probability of los¬ 
ses inasmuch as the probability of all servers being busy when there 
are a large number of servers is small. But then as the load on every 
server increases, the probability of losses gradually levels off. 











346 


Chap. 11. Elements of Queueing Theory 


Sec. 61. Birth and Death Processes 

Erlang’s problem and many other problems of queueing theory 
considered under the elementary assumptions spoken of at the end 
of the preceding section fit into a scheme that bears the name “birth 
and death processes”. This class of processes attracted attention in 
connection with biological problems of the sizes of population, the 
spread of epidemics, and so forth. Since the mathematical scheme 
underlying birth and death processes is of a sufficiently general na¬ 
ture, the theory was broadly applied to many other problems as well. 

Let us suppose that a certain system can at every instant of time 
be in one of the states Eo. Eu Esi • • •, the set of which is finite or 
countable. The states of the system change with time, and during 
an interval of duration h the system will pass from state En at time 
t to state En+i with probability and to state fn-i with 

probability v„/i+o(/i). The probabilities that during the time inter¬ 
val (t, t-{-h) the system will pass to state -En±fe for 1 are infinitely 
small in comparison with h. From this it follows that the probability 
of staying in the state during the same time interval is equal to 
1 — 'kji — Vnh-\-o(h). The constants and v„ are assumed to be depen¬ 
dent on n but independent of t and of how the system got to that state. 
This latter circumstance signifies that the process at hand is Marko¬ 
vian. The theory that will be given here can be extended to the case 
when the quantities and v„ are dependent on t. 

The stochastic process just described goes by the name birth and 
death process. If by is understood an event that the size of a popu¬ 
lation is n, then the transition means that the size of the 

population has increased by unity. Similarly, the transition En-^ 
-^En -1 is to be regarded as the death of one member of the population. 

If for any 1 the equalities v„=0 hold, that is, if only the tran¬ 
sitions En-^En+i are possible at the time of change of state, then the 
process is called a birth process (the phrase “a pure birth process” is 
sometimes used). But if all (/i=0, 1, 2, ...), then the process 

is a death process. 

The Poisson process that we studied in Sec. 51 is a birth process; 
here, K= ^ for all 

The Erlang problem that we examined in the preceding section 
is also a birth and death process for which when 0 <n and 
for k'^n\ V|^=Q \oi k>n and \k=kv for l^k^n. 

We denote by phlt) the probability that the system we are studying 
is in state Eh at time t. Reasoning like we did in Sec. 51 and when we 
considered the Erlang problem, we come to a system of differential 
equations that governs the birth and death process: 

Po{t)^-KpAt)+^iPAt) (1) 

and for k'^X 

P'k (0 = — (^* + V*) p* (t)+K-iPk-i (0 + n+iPi+i (0 (2) 



Sec. 61. Birth and Death Processes 


347 


Our notations are somewhat deficient in that we have not indi¬ 
cated from what state the system began to change. An exhaustive 
notation would be (/)—the probability that the system will, 
at time ty be in the state Ej if at time 0 it was in the state Ei. 
In Sec. 51 and in Erlang’s problem we assumed that E^ was the 
initial state. 

Equations (1) and (2) become especially simple for processes of 
pure death and pure birth. In the latter instance, after performing 
successive integration (the formulas are written on the assumption 
that all are different) we get 

Po (0 = 

- —L_ {e-K< —4- - —L— («-*•!<— 

Ag Ao A2 Ai 


Pa (0 = 


^qAi 


Here we assumed that for ^=0 the system is in state E^. There 
is no difficulty in writing the general solution and seeing that the 
functions p*(/) are nonnegative for all k and t. However, if 

QD 

grow too rapidly as k increases, it may happen that pb{t) < \. 

k=0 


Feller’s Theorem. In order that for all values of t the solutions 
Pk (0 of the equations of pure birth satisfy the relation 

S p* (<) = 1 (3) 

*=0 

it is necessary and sufficient that the series 

i ’ (4) 

k=0 

be divergent. 

Proof. Consider the partial sum of the series (3) 

(0 + Pi (^)+• • •(5) 
From the birth equations it follows that 

Sn{t)=--KPn{t) 

From this we find that 

t 

= (6) 

0 

(if in place of the initial condition p^ (0) = I we take a different 
one, namely 77;(0)=1, then Equation (6) holds for n^i). 



348 


Chap. II. Elements of Queueing Theory 


Since all terms of the sum (5) are nonnegative, for every fixed 
value of t the sum 5„(^) does not diminish with increasing n. 
Consequently, the limit 

lira(l-S„(0) = iA(0 (7) 

CD 

exists. 

By virtue of (6) we conclude that 

t 

KlPn(t)dt^y.(t) 

0 

From this it is clear that 

i 

J S„ (z) dz ^ ^ (0 + • • • + 

0 ” 

Since for any t and n the inequality S„ {t) < 1 holds, it follows 
that 

If the series (4) is divergent, it follows from the last inequality 
that p.(0 should be 0 for all t. From (7) it now follows that the 
divergence of the series (4) leads to (3). 

From (6) it is clear that 

t 

K\pAt)dt<\ 

0 

and, hence, 

..+-^ 

0 

In the limit, as n—»-oo, we get 


(1 (01 < 2 

n = 0 


If fi(0=-0 for all ty then the left-hand side of the inequality 
is equal to t and since t is arbitrary, the series on the right-hand 
side diverges. The theorem is proved. 

In Sec. 51, we had k„ = K for n'^0. Consequently, the series 

qo 

(4) diverges and for all t the equality 2 Pn (0 = ^ holds. 

n=0 



Sec. 61. Birth and Death Processes 


349 


The sum 2 P«(0 be interpreted as the probability that 

n=Q 

during time t there will occur only a finite number of changes of 
state of the system. Thus, the difference 

n=0 

should be interpreted as the probability of an infinite number of chan¬ 
ges of state of the system during time t. In radioactive decay this pos¬ 
sibility implies an avalanche type of disintegration. 

Example 1. Stand-by Relief Without Repair. Imagine a system 
consisting of one main server and n stand-by servers. During a time 
interval (f, t+h) the main server can break down with a probability 
Xh-\-o{h), and each of the stand-by servers (the so-called* stand-by 
relief) can break down with a probability X'h-{-o{h). A server that 
has failed drops out of the system. The main server that has broken 
down is immediately replaced by one of the stand-by servers. The 
system as a whole breaks down as soon as all servers, both main and 
stand-by, fail. Find the probability that at time t there are k servers 
in the system that have broken down (event Ek). 

This is a case of pure birth. Here X^~X-\-{n — k)X' 

X„+fi= O(^^l). Simple calculations yield the equations 

Pi, (0 = (1 -g-’-'O* (0<*<n) 

and 

t 

Pn*x (t )=J (1 dz 

0 

In particular, if V=0 (nonloaded stand-by in which the servers 
do not break down) the following equalities hold: 

fe =0 

When (loaded stand-by in which the stand-by servers are 

loaded just like the main server) 

Pk it) = (1 

Denote by the lifetime of the ^th element in an operating 
period. For nonloaded stand-by relief the length of life of the 
system is 


+ ia + • • ‘ H" li* 



350 Chap. 11. Elements of Queueing Theory 

Since the average period of operation of one server is equal to 

0 

the mean operating period of the system, in the case of “cold” 
stand-by relief, is equal to -p-; that is, it is proportional to the 

total number of servers in the system. 

The mean operating time of a system without breakdown when 
the stand-by relief is loaded is calculated in the following manner: 
note the times of successive breakdowns Ti, 4, . • •, tn+i and intro¬ 
duce the notations T 2 —^ 2 » • • •»^^n+i in+\ 

Since all servers are in operation during the first interval, the pro¬ 
bability that during time t none will fail is equal to the 

probability that in the second interval not a single server will break 
down during time t is equal to and so forth; finally, the pro¬ 
bability that during time t there will be no breakdowns in the (n-|-l)st 
interval is equal to The mean operating time of the system is 

rt+1 

;£lVlTt = -i(l+y+.. •+^) 

k— 1 

If tl is great, then 

where C is Euler’s constant. 

We see that with increasing number of stand-by servers the mean 
time of no-breakdown operation of the system increases much faster 
in the case of nonloaded stand-by relief than in the case of loaded re¬ 
lief. 

In the case of a pure birth process the system of Equations (l)-(2) 
was solved very simply by successive integration, since the differen¬ 
tial equations had the form of recursion relations. The general equa¬ 
tions of the birth and death process are of a different structure and 
the successive determination of the functions pk (t) is no longer pos¬ 
sible. The conditions of the existence and uniqueness of solutions of 
this system have been thoroughly studied in the works of Feller, 
Reuter, McGregor and Karlin. It was found that the equation 

2 p*(o=1 

fe =0 



Sec. 61. Birth and Death Processes 


351 


holds for all t if the series 



is divergent. 

If, in addition, the series 


( 8 ) 


aa 


h-1 


fe=if=i * 


(9) 


is convergent, then for all k the limits 

pj=limp*(0 (10) 

t CO 

exist. 

In particular, this condition holds in all cases when, for k'^k^ 
beginning with some onwards, the inequality 


‘Vk+i 


< 1 


holds. Intuitively, these conditions are clear: what they mean is 
that the arrival of calls (demands) in the service system must not 
exceed the speed of service. 

To determine the limits (10), it is necessary to solve a system 
of algebraic equations that is obtained from the system (l)-(2) if 
we put jt7i(/) = 0 and if we put in place of This system 

then has the form 

^''oPo “1“ ~ 

— (Kr^'^k) Pk~^^k~iPk-i~^^k+iPk+i — ^ (^^ 1 ) ( 11 ) 


We introduce the notation 

^k~ ^kPk~^ "^k+iPk+i* ^ — 0 , 1 , 2 , ... 

With this notation, the equations take the form 

20 = 0, Zf,_i—Zk = 0 (for k^l) 
Whence it follows that for all k 


Consequently 


2* 


= 0 


Pk^ 




Pk 


k 



i=l 



( 12 ) 



352 Chap. 11. Elements of Queueing Theory 

The constant is determined from the normalization condition 


00 


2 






It is obvious that these formulas contain the earlier obtained Er¬ 
lang formulas. 

To illustrate the theory we consider the following examples. 


Example 2. A Service System with a Waiting Line (Queue). A Pois¬ 
son flow of demands with parameter (intensity) X comes into n iden¬ 
tical servers. A demand that arrives at a server requires a random ser¬ 
vice time with probability distributionIf at the time of 
arrival of a demand there is at least one free server, it serves immedi¬ 
ately. If all servers are busy, the new arrivals form a waiting line. 
If there is a waiting line, then as soon as a server completes a service 
session it immediately switches to servicing the next demand in line. 
The problem is to find the probability of one or another number of 
demands being in the service system. 

Our conditions are those developed in the theory in the present 
section; for our problem for all k, Vf^ — kv for k^n and 

v^ = nv for k'^n. 

According to formulas (12) and (13) the stationary solutions for 
our problem are of the form: 

Pk-\Po 


for k^n and 






Po 


for k'^n\ here 





pft pB+i 

k\ ‘ rt! (rt — p) 


1 


for p < n. 

It turns out that for p^n, Po=0, and also at the same time pfe=0 
for all k. This result is very important and should be borne in mind 
in practical situations. In words it may be formulated as follows; 
in all cases in which p^ n, the length of the queue increases without 
bound with time. 



Sec. 61 . Birth and Death Processes 


353 


Example 3. Maintenance of Machines by a Team of Repairmen. 

A team of r repairmen services n machines of the same type n). 
Each one of the machines may demand the attention of a repairman 
at random moments. The machines go out of commission independently 
of one another. The probability of dropping out of operation during 
the time interval (/, t-\-h) is equal to ih~\-o{h). The probability that 
during time (f, t-\-h) a machine will be put into operation again is 
equal to v/i+o(/i). Each repairman can repair only one machine at 
a time; each machine is handled by only one repairman. The para¬ 
meters X and V are independent of t and n and also of the number of 
machines undergoing repair. Find the probability that in a steady- 
state process of ^service there will be a certain number of machines 
idle at a given ‘time. 

By Ef^ we denote the event that k machines are out of commission 
at a given instant. It is obvious that our system can only be in 
the states Eq, E^, ..., It is easy to see that we have to do 
with a birth and death process for which ^ (n — k)^. for 0^^ < 

for for v^^rv for Formu¬ 

las (12) and (13) yield the equations: for 1 


for r^k^n 


n\ 




(n—jfe)! 


and 


Po 




n 


nl 


Lft=0 ' ^ kt=r+ 1 ' 


-1 


In particular, for r^l 


Pk 


n\ 

(n—fe)l 




Po = 


.h 



A simple numerical calculation will serve to illustrate these 
formulas. Suppose eight machines are serviced by two repairmen. 
What is the best way to organize the work: put both repairmen 
in charge of all the machines so that the workman that is free 
at the moment attends any machine that has just stopped, or 
assign four definite machines to each repairman? The calculations 
are carried out on the assumption that p=0.2. The results are 
tabulated. 



354 


Chap. 11. Elements of Queueing Theory 


n=z8, r = 2 


Number of 
nonoperating 
machines 

Number of 
machines 
awaiting 
servicing 

Number of 
repairmen 
idle 

Pk 

0 

0 

2 

0.2048 

1 

0 

1 

0.3277 

2 

0 

0 

0.2294 

3 

1 

0 

0.1417 

4 

2 

0 

0.0687 

5 

3 

0 

0.0275 

6 

4 

0 

0.0083 

7 

5 

0 

0.0017 

8 

6 

0 

0.0002 


The number of machines idle at a given time for the reason 
that the repairmen are busy with other machines is 

2 (fe—2)p*= 0.3045 

k=2 

The total idling time of the machines (repair and waiting for 
repair) is equal to 

OD 

2 = 1 -6875 

k=2 

The mean duration of free time of the repairmen is 

2x0.2048+ 1 x0.3277= 0.7373 

In other words, each repairman is idle during 0.3686 working day. 


« = 4, r = 1 


Number of 
nonoperating 
machines 

Number of 
machines 
awaitit^ 
servicing 

Number of 
repairmen 
idle 

Pk 

0 

0 

1 

0.3984 

1 


0 

0.3189 

2 

1 

0 

0.1914 

3 

2 

0 

0.0760 

4 

3 

0 

0.0153 










Sec. 62. Single-Server Queueing System 


355 


The mean time of unproductive idling of machines (waiting for 
onset of repair work) is 

1 X 0.1914+2 X 0.0760+3 X 0.0153=0,3893 

The whole group of eight machines will lose 0.7886 working day, 
which means the loss of working time of the machines due to waiting 
for repairs will more than double compared with the first type of 
labour organization. The total loss of time for the four machines 
(waiting+repairs) is 

1X0.3189+2X0.1914+3X0.0760+4X0.0153=0.9909 

Thus all eight machines will lose 1.9818 working days. On the average, 
a repairman is free during 0.3984 working day which means he is 
less engaged, though the machines stand idle for a longer time. 


Sec. 62. Single-Server Queueing System 

For the case when a service system has only one server, the queueing 
problem may be solved on the basis of much broader premises than 
those lying at the root of birth and death processes. This case is of 
particular interest from the standpoint of applications since one often 
has to do precisely with one server, or the flow of demands is divi¬ 
ded in advance between servers according to a definite principle that 
is independent of the load on the servers at the time of arrival of a 
demand. Following the terminology of telephony, the case of one ser¬ 
ver goes by the name of a single-server (one-channel) system. 

Suppose that the arrival times of the demands 4, • • •, • • • 

are such that the quantities 2o=4» 2i=4—4» • • •» 2^=4+i—4* • • • 
are mutually independent and identically distributed. We take it that 

and 

CD 

a — MZj. = ^ tdF{t) < + oo 
0 


Demands arriving in the system are serviced immediately if a 
server is free, and get in a waiting line if it is busy servicing an 
earlier arrival. Servicing demands occupies a random time 
where r is the serial number of an arriving demand. We assume 
that 


G(a:)== P{Yr<^} 


to 

b= M7;.= ^ XdG (x) <i -{- oo 


and 



356 


Chap. It. Elements of Queueing Theory 


Our problem is to find the distribution of waiting time before 
servicing has begun. We denote this quantity by for the rth 
arrival. For the sake of definiteness let w^—Q; that is, the ser¬ 
ver is assumed to be free at the start. 



Vr 

Fig. 21 


It will readily be seen that we have the following equations: 


w 


r+i 


-{ 


tlfr+Tr- 

0 


'rt 


if > 0 

if Wr-hVr — 


( 1 ) 


The first equation is well illustrated graphically in Fig. 21. 
We put Uj.= yr —introduce the notation 

Ur (Ar) = P {yr— 2 r < x) 


Since for r^l, and have distributions that are independent 
of r, the distribution of Uf. is likewise independent of r for 1. 
We denote it by (/ (x). 

Further, denote the distribution function of Wr by Lj.{x). By 
virtue of the nonnegativity of the variables for all r and 
x<0, we have the equation 

Lr{x) = 0 

Since ajj = 0 by assumption, for ;ic > 0 we have 

L, (.Jc) = 1 

Relation (1) permits establishing the connection that exists 
between L;.(a:) and Lr+iix). Indeed, by (1), for a: > 0, we have 

4+1 (^) == P <x\=P {Wr+, = 0} 4* P {0 < Wr+i < x} = 

== P {Wr + Ur < 0} + P {0 <Wr+Ur<x\ = 

= P{Wr+Ur<x} ( 2 ) 

The variables and Uf are independent, and so 

«0 X 

Lr*i(x)= S Lr{x—v)dU^{v)= 5 Lr(x—v)dU,(v) 


( 3 ) 



Sec. 62. Single-Server Queueing System 


357 


From this it is possible, knowing Lj (a:) and U (x), to determine 
successively the functions (jc), (x), .... In particular, when a:>0, 

X 

Z,, (x)=J Lt{x—v)dU,{v) = U{x) 

— (D 

and 

X X 

L 3 (x)= I L^ix—v)dU^ix)= I U{x—v)dU{v) 

— OB —OD 

The last two equations permit formulating the following result: 
the distribution of waiting time does not depend on the distributions 
F (x) and G(x) themselves, but only on the distribution U{x). 

Note that the distribution Lzix) does not coincide fully with the 
distribution U(x) since Lz{x)=0 for whereas it is possible to 

find x<,0 such that for them U(x)'>0. The probability that the se¬ 
cond (in order of arrival) demand will be serviced without waiting 
is equal to LaC+O)— Lz( —0)=t/(4-0). The relation (2) yields more; 
namely, since for any r, we can write the following sequence 

of equations for x>0: 

L^(x)=P{Ui<x} 

L 9 W = P 1^1 + «3 < < 4 

Li(x) = P{ut+Ut + u^<x, u^ + u^Kx, u^<x} 


Lr.^i{x)=P^ Uj<x,s^\, 2, ..., r 

s=l. 2..} 

Let us now consider the behaviour of Lj.{x,) as r—♦ 00 . It will 
readily be seen that for any x, as r—^oo, the functions L^ix) 
tend to a limiting value. Indeed, consider for this purpose the 
event Er that 



2 ^ ^or all 

yssi 

It is obvious that every implies events E^ with smaller 
subscripts. If by E we denote the event 




358 


Chap. 11. Elements of Queueing Theory 


then, by virtue of the continuity axiom, 

lim Lr+i {x) = lim P {E^) = P {£} 

r-^oo 

If we introduce the notation 

L (x) = P {£} = P 12 > 11 

then according to the preceding discussion Equation (3) will pass 
into 

X 

L(x)= 5 L(x—z)dU(z) (4) 

— 00 

which, by an obvious change of variables, becomes 

00 

L(x)=\L(y)dU {x—y) (5) 

0 

Now let us investigate the behaviour of the function under va¬ 
rious assumptions relative to the function U (x), more precisely, 
relative to its first moment (the expectation of the variable 
Z;). We know that according to the law of large numbers 
we have the equation 

' n 

P lim — 21 = Mmi = =1 (6) 

n-*-a> ^ f_| 


From this, for lAu^ > 0, we conclude that with probability one 
there is an such that for all n > ^^o the inequality 

U/> yMu. 

holds. 

For every x>0 there will be an n so large that y fAu^ > x. 
The preceding inequality shows that with probability one we have 

n 

the inequality 2 ^ ^ sufficiently large n. 

t=1 

This means that for any x>0 

L{x) — 0 

The actual meaning of this equality is: with probability one the 
waiting time of the nth demand that has arrived in the service 



Sec. 62. Single-Server Queueing System 


359 


system exceeds any x > 0 as n increases to infinity (in other 
words, as the time of operation of the service system approaches 
infinity). 

Now let < 0, that is, b < a; on the average, the service time 
is less than the interval between successive arrivals of demands for 
service. We will prove that in this case L{x) is a distribution 
function, that is, that L( + oo)=l. Indeed, by virtue of (6), an 
will be found such that 


P W/ < 0 when n > /loj > 1 — 


But for every 6 an x may be found such that 


Pi 


n 


> 1 


2 w,- < X when 1 ^ n ^ 

t=i 

Thus, for every 6 > 0 there will be an x such that 

> 1-6 

) 

We therefore conclude that 


6 


i ” 

P]]^«/<x when 


L (-|-oo) = lim P i 2 W/ < x; n ^ 11 = 1 


The case of Mui~0 requires more profound methods of investi¬ 
gation than the strong law of large numbers. It turns out that here, 
with the exception of the possibility of Ui=0, when the service time 
is exactly equal to the interval prior to the arrival of the next demand, 
for every x>-0 we have the equality L(x)=0. In other words, if on 
the average the service time is equal to the mean duration of the time 
interval between two successive arrivals of demands for service, 
then, with the exception of the trivial case mentioned above, the 
waiting line for service will increase without bound with time. This 
result is important both for theory and applications. 

Now let us solve Equation (5). To do this, we compute the character¬ 
istic function of the integral 


S(x)= 5 L(x-y)dU(y) 

— 00 

in two different ways. In the process, we introduce an additional 
assumption: for x > 0 

F(x) = 1— 



360 


Chap. 11. Elements of Queueing Theory 


in other words, we assume the traffic of demands for service to be 
elementary with parameter X. 

Since S(a;) is the distribution of the algebraic sum w —ej-f-y, 
the characteristic function of S(;c) is 

Here, and I {t) denote, respectively, the characteristic functions 

for the distributions G{x) and Lix , and as will readily be 

seen, is the characteristic function of the variable — z. 

By definition, 

OD 

U {x)=P {v —2 < ;c} = X J G {x-{-y)e-^ydy 

0 

According to (5), 

Six) = L (x) 

for > 0. When x^O 

0 0 » 

S(;r)= j L {—z)d^U {x-i-z) = X J L{—z)d^Gix + y-^z)e-^ydy 

— ce — eo 0 


But X and y are negative, and so G(x + i/ + 2 ) —0 in the interval 
y from 0 to — (x + z). Thus, 

0 00 

S(x)=A J L(—z)d J G{x~\-y + z)e-^y dy^ 


CO 




CO 


X J L{ — z)d{G{v)e'^^^^~^~^^dy~ce^ 
— 00 0 


where 


00 


c= J L (— 2 ) X^e^ dz ^ G v) dv 
— 00 0 

We now find 

so 0 ® 

s(^)= J dS (x) ~ cX ^ dL{x) — c 


— 00 


— oo 


cX 


Equating the two expressions for s{t), we find that 

it+X{l-git)) ^ j 



Sec. 63. A Limit Theorem for Flows 


361 


Now compute the constant c. To do this, note that 




And, finally. 


/(/) 


l — Kh 




Differentiation of this formula leads to the formulas of expecta¬ 
tion and variance of the variable w, which is the waiting time 
for service of an arrival (demand): 


\ 0 
and 

Dtn) = (Mt£))2 + 3 (1^3 = f 4G W 

This formula shows that l/^Dw!>Mw, or that considerable fluctu¬ 
ations are possible in the waiting time. 

The foregoing formulas were obtained by A. Ya. Khinchin; the 
theory developed in the first half of this section is due to D. Lindley, 


Sec. 63. Limit Theorem for Flows 

We have already stated that the overwhelming majority of studies 
in the theory of queues and in reliability theory proceed, at the pre¬ 
sent time, from the assumption that the incoming flow of demands 
(or, in reliability theory, the flow of breakdowns) is elementary. 
In a number of practically important cases the initial assumptions 
that in Sec. 51 served as a basis for deriving the form of an elementary 
flow do not follow from any consideration of the physical picture of 
the phenomenon. Indeed, in certain problems we find significant 
deviations of actual traffic from the elementary type. It would seem 
that such deviations, due to the enormous diversity of conditions un¬ 
der which actual phenomena occur, should be the rule and not the 
exception. However, it appears that great deviations are incomparab¬ 
ly more rare than might be expected on the basis of a priori reasoning. 
The problem thus arises of determining the causes by virtue of which 
elementary traffic is so often in good agreement with the course of 
real flows. In recent years this problem has been investigated in a large 
number of works. We confine ourselves here to only one model that 
leads to Poisson flows (these include elementary flows as well). 



362 


Chap. 11. Elements of Queueing Theory 


Suppose that the observed flow is the sum of a large number of 
independent flows of small intensity. Then, as will be demonstrated, 
the overall flow will, under very broad conditions, be almost Poisson. 
Problems very often involve total flows. The traffic of calls at a tele¬ 
phone exchange may be regarded as the sum of flows of inidividual 
subscribers. The incoming traffic of cargo ships at a port is the sum 
of the flows of the ships leaving from various other ports. The flow 
of breakdowns of an intricate device is the sum of the flows of break¬ 
downs of its elements. The flow of calls for emergency medical aid 
is also made up of a very large number of calls from individuals. The 
list of concrete examples could be extended, but there is no need since 
anyone could add large numbers of cases from familiar spheres of 
activity. The thing to note here is that of particular interest is a con¬ 
sideration of such total flows, the component flows of which are uni¬ 
formly small in some definite sense. We shall now prove a limit theorem 
for total flows on the basis of the ideas and methods given in Chapter 9. 

A stochastic process X (t) will be called a step process if for any x 
and s (OCsC 0 the increments A (t)—X (s) can take on only nonnega¬ 
tive integral values. We assume that X(0)=0. This means that the 
process began only at time /=0. The value of the process X (t) may 
be interpreted to mean the number of occurrences of certain events 
during the time interval between 0 and t. Such events are telephone 
calls arriving at a telephone exchange, arrival of clients at a barber 
shop, breakdown of electronic equipment, and so forth. Note that 
step processes can only change at specific points at once, a whole num¬ 
ber of units. This may be regarded as the simultaneous arrival of 
several demands for service. Actual situations of this kind are rather 
frequent: the arrival in a port of a number of barges towed by a single 
tugboat, the arrival at a hospital of an ambulance with several people 
injured in an automobile accident, and so on. 

Let 


XAt)='tx„At) 

r=l 

where are mutually independent step processes. It is obvious 

that the process X„(t) is also a step process. 

We will say that the sequence of processes (t) converges weakly 
to the process X (t) if the distribution function of the vectors 

xjh), X„(g, .... x„(4) 

for any choice of n, t,, .... at each point of continuity 
converges to the value of the distribution function of the vector 

X (h), X (/,), 


• • 9 


. X{y 



Sec. 63. A Limit Theorem for Flows 


363 


We say that the process X (t) is a Poisson process with leading 
function A{t) if it: (1) has independent increments in nonoverlapping 
intervals and (2) for all s</ and for any nonnegative integral 

P {X (t)—X (s) = A} = -fA fo-A W 1 


The leading function A(/) is nonnegative, continuous on the left, 
finite for every /, and for ^^0 satisfies the equality A(^)=0. For 
the elementary flow studied in Sec. 51, A{t)—'kt, 

Let us introduce the following notation: 


p„r(k-, S, 0=P{X«r(0 —*=0, 1, 2, 
A„(s, <)= 2p»r(l'. s. t) 


r=l 
kn 


B„(S, 0= S[l—P„r(0: S. 0 —P„r(l; S. 0] 

r= I 


( 1 ) 

( 2 ) 

(3) 


Of the processes X„,(t) we will say that they are 

infinitesimal if for every fixed t 

lim max [I— p^riOi 0. 0]=0 (4) 

In other words, the processes X„r{t) are infinitesimal if for any 
8>0 and an arbitrary fixed t it is possible to indicate an n such 
that for all r at once 

P{X„,(/)>8Ke 

We will now state and prove a limit theorem under conditions 
due to B. 1. Grigelionis. 

Theorem. For the convergence of the sums 

kn 

r=l 

of mutually independent infinitesimal processes Xnj.{t) to the Pois¬ 
son process with leading function A(t), it is necessary and sufficient 
that for any fixed s arS. t (s</) the following relations should hold: 

limA„(s, t) = A{t) —A(s) (5) 

and 

Iim5„(0, 0=0 (6) 

n-*cD 

Proof, Proof of the necessity of the hypothesis of the theorem 
is based on the following proposition of the theory of summation 



364 


Chap, 11. Elements of Queueing Theory 


of independent random variables. If the independent random vari¬ 
ables Xnz, • • Xnkn ^rc infinitesimal, that is, for any e > 0 and 
as n—>oo 

sup p { KjI > e} — 0 

then in order that the distribution functions of the sums 

~ Xfii -^na “h • • • "4” Xnkn 

should converge to the Poisson distribution 

0 < fe < X 

as n tends to infinity, it is necessary and sufficient that the fol¬ 
lowing conditions be fulfilled: for every 8(0 < e < 1) as n tends to 
infinity, 

( 1 ) 2 I dF„,{x)^0 

(2) S S dF„,{x)-^X 

k=l\X — I \ < e 

(3) 2 I xdF„,{x)-.0 

1x1 <e 

(4) 2 5 x^dF„^(x)—( J xdf„j(A:)') ] — 0 

1 Ll-*i <8 \| X1 <e / J 

Here, the following notations are introduced: 
is the region obtained from the real infinite line by eliminating 
intervals |a:|< 8 and \x —11 < e. 

It will be noted that in the Grigelionis theorem we have to put 

X = A{t)—A(s) 

Pniih S, t)= J dF„^(X) 

1X—11< e 

1—p„*(0: s, 0—s. 0= S 

R. 

It is now clear that the first and second conditions of the foregoing 
theorem on convergence to the Poisson distribution coincide exactly 
with the conditions (5) and (6). The third and fourth conditions of 
this theorem are automatically fulfilled for step processes, since in 
the interval |x j<8 their distribution functions have only one point 
of increase x=0. 



Sec. 63. A Limit Theorem for Flows 


365 


Thus, the necessity of the Grigelionis conditions follows from the 
fact that if the processes under consideration converge, then their 
one-dimensional distributions should converge as well. 

We shall now prove that the conditions of the theorem are sufficient. 
To do this, it is obviously sufficient to demonstrate that the condi¬ 
tions (5) and (6) ensure both asymptotic independence of increments 
of the process (t) and convergence of one-dimensional distributions 
to the corresponding Poisson distributions. Incidentally, the second 
part of our program follows completely from the theorem we formu¬ 
lated concerning the convergence of the distribution functions of 
sums to the Poisson distribution. 

Let us consider the vectors 


•••» U where /v>0 are integers 

0 = ( 0 , 0 , ..., 0 ) 

^ - > 


l, = (0, 0, .... 0 , 1 , 0 , 0 . 0 ) 

'- , --— - , - 

V—1 m-v 

T = (tQf tif ..•> tm)> 0 ^ ^0 ^1 ••• 

numbers 


are arbitrary real 


a:=(ai, a^, ..., aj 

x„(T) = S^(r) 

r= 1 

We also introduce the following supplementary notations: 


m 

(a, P) 2 

t=i 

p„,(7, T)=P{X„,(r) = r}_ 

/„,(a, p = Mexpt(a, X„,^)) 
/„(a, r) = Mexp/(a, X„ (T)) 


For the distributions of the vectors X„^(r) to converge to the 
corresponding distributions of the Poisson process as fi—»-oo it is 
sufficient that their characteristic functions converge. Let us try to 
detect this. _ _ 

By virtue of the independence of the processes X^(T’), we have 



366 


Chup. 11. Elements of Queueing Theory 


But 

fnr (a. n = 2 Pnr 0. T) e‘ <“■ ^> = 1+ (7, T) (e‘ <3. o _ i) 

7 1^0 

where the symbol 2 denotes summation over all possible integral 

7 

vectors I with nonnegative components. 

For small x 

1 -f a; = exp [x-{-0 (;r^)] 

therefore, 


f„,(a, T) = exp / 2 p„,Cl, T) {e‘ <5. o_ i) + o [/ r)y 11 = 

\ I ¥=0 L\^¥=o /Ji 


exp I 2 Pnr (K, T) 1) + 0 / _ 2 Pnr Cl, T')\ + 

t''=l 1 j 

\v=i, ...,m / 

( rSi ''-'’' ">)■]} 


m 


+ 0 


It is obvious that 


2_ Pnr C, T) = 1 -P { X„, (t„) = 0} < 

/ ^ 0 

<l-P{X„,(O = 0} = l-p„,(0; 0, U 
2 Pnr(i, r)=p{x„,(u-x„,(<o)>2K 


1^0, /y 

v=l. m 


<P{X„,(U>2> 


(7) 


m 


2 Pnr (I 7’)<2p„,(4. T) + P{X„,(U>2} 

7 ¥='0 


V=1 


We note that 


Pnr(^* ^v-l» ^v) PnriK* 

+ {U-X„r (^) 0} < P {> 2} (8) 

The relations (7) and (8) permit us to rewrite in different form 
the earlier found representation of the function T), namely 


m 


fnriP^^ T)==exp< 2P«r(l; ^v-i, —1) + 

V v=i 

+ O[P{7C„,(i.)^2}]+o[(l-p„,(0: 0. ;j)2^p„,(l: t,) } 



Sec. 64. Elements of the Theory of Stand-by Systems 


367 


From this we conclude that 

{ m 

1 ) + 

v=l 

+o[5„(o, /j]+or max (1—^,,(0; 0, 

The conditions of the theorem now lead to the following limit 
relationship: as n tends to infinity, 

m 

7’) -nexp{[AX.)-A«...)](a'“»-l)} 

V=1 

thus proving the theorem. 

The theorem that has just been proved permits us to obtain a large 
number of corollaries if we specialize the assumptions concerning 
the terms of the step processes. Prior to Grigelionis, A. Ya. Khinchin 
and G. A. Ososkov investigated the conditions of convergence of a 
total process to an elementary one on the assumption that the compo¬ 
nent processes are ordinary and their increments are stationary. The 
result which they obtained signifies qualitatively that if the component 
processes are independent, ordinary and stationary, then their 
infinitesimal character (given certain supplementary general conditions 
of a quantitative nature) practically ensures that such processes are 
asymptotically close to elementary processes. This result is of interest 
both to theory and to application. 

Sec. 64. Elements of the Theory of Stand-by Systems 

In present-day technology, reliability of equipment is increased 
by employing the method of stand-by systems, that is, the introduction 
of extra components, units and entire assemblies. The purpose of 
the supplementary devices is to take over operation if the basic sys¬ 
tems break down. Depending on the state of the stand-by equipment, 
one distinguishes loaded, nonloaded and partially loaded relief. 
In the case of loaded relief, the stand-by unit is in the same state 
as the operating unit and for this reason has the same intensity of 
breakdowns. In the partially loaded case, the stand-by device is 
loaded, but not so fully as the main equipment and for this reason 
has a different breakdown intensity. A stand-by unit that is not 
loaded does not, naturally, suffer breakdown. The spare wheel of an 
automobile is a typical example of nonloaded relief. Quite naturally, 
loaded and nonloaded relief are special cases of partially loaded relief. 

Numerous problems arise in the theory of stand-by systems that 
differ only terminologically from the problems of queueing theory. 
It is therefore natural, in a brief chapter devoted to the theory of 
queues, to examine certain problems in the theory of stand-by sys¬ 
tems. 



368 


Chap. 11 Elements of Queueing Theory 


In order to illustrate the effectiveness of stand-by systems, let us 
consider a small numerical example. Suppose the probability that 
a certain device will operate without breakdown during a specified 
period of time is 0.9. Four such devices are to operate independently. 
The probability that all four devices will operate without failure is 
0.9^=0.6561. Now suppose we have one device in a state of loaded 
relief. The probability of at least four devices operating without 
failure during the specified time is equal to 0.9®+5x0.9^x0.1== 
=0.91854. The presence of two reserve units increases the probability 
of non-breakdown operation of the system (that is, the probability 
of maintaining at least four devices in operation during the entire 
specified period of time) to 0,98415. 

Many interesting results are given in a paper by A. D. Solovyev 
devoted to stand-by systems without repair. The following is one such 
result. 

It is possible to have an entire device in reserve; for instance, a gene¬ 
rator at a power station or a diesel locomotive at a railway junction; 
it is also possible to have in reserve a component of a system or even 
a single element. The question arises as to what is preferable, to have 
large units or single elements in reserve. 

Theorem. If the switching of stand-by devices (units, elements, etc.) 
is flawless, then both in the case of loaded and nonloaded relief, an in¬ 
crease in the scale of the stand-by system reduces non-breakdown operation 
of the whole system. 

Proof. It will readily be seen that it is sufficient for us to confine 
ourselves to the case when only two parts of a device are united and 
there is one stand-by unit for each part. Figure 22u: depicts schemati¬ 
cally the stand-by system of an expanded system, and Fig. 22b illus¬ 
trates stand-by relief of the individual parts. Let us call each of the 



7 W777 




Fig. 22 

five parts an element. By Ti and T 2 we denote the duration of faultless 
operation of the basic elements and by and the duration of 
the corresponding time of the stand-by units. The distributions of 
these variables are arbitrary. 

Let us denote by Ti the duration of flawless operation of the sup¬ 
ported system in the case of stand-by relief by a large unit and by 
Tg the case of stand-by relief by single elements. 

It is obvious that in the case of loaded relief we have the equations 
r, == max [min (Tj, Tj), min (tj^, tg )J 



Sec. 64. Elements of the Theory of Stand-by Systems 


369 


and 

Tj — min [max(Ti, Xj), maxCXj, x;)] 

Since 

ri^max(Xi, xi), Ti^^max (Xg, Xj) 

it is clear that 

Tj^min [max (Xi, Xj), max(x 2 , 

For loaded relief, the assertion of the theorem has been proved. 
For nonloaded relief we have 

Tj = min(xi, xJ + minCx^, x^) 
and 

T^ = mm [xj + x;, x^ + x;] 

Since 

^'^1 “ b '•'1 ’ 

we have the inequality 

r,<min [xj + x;, 


This proves the theorem for nonloaded relief. 

To increase the effectiveness of stand-by systems, devices that have 
failed are repaired. Let us investigate the effect of repair on increa¬ 
sing the reliability. We confine ourselves to the case of one basic and 
one reserve system. 

Let us assume that the*following conditions are fulfilled: 

(1) on breakdown of the basic device, the stand-by unit immediately 
takes up the load; 

(2) the device that has failed undergoes repair Immediately; 

(3) the repairs fully restore the properties of the basic device that 
failed; 

(4) the repair time is a random variable with a distribution fun¬ 
ction G (x); 

(5) the repaired device becomes a stand-by unit; 

(6) the period of faultless operation of the device is random and 
is distributed in accord with the law F(x)~i---exp (—Xx) (^>0) 
for the basic device and in accord with the law Ft {x)~l —exp (■—Xix) 
(Xi^O) for the stand-by device. In particular, if the stand-by unit 
is loaded, then and if it is nonloaded, then ki~0. 

We shall say that our system (basic unit plus stand-by unit) 
breaks down if both devices go out of commission at the same time. 
Denote by R (x) the probability that the system wnll operate flawless¬ 
ly for a time greater than x. Let us also introduce the Laplace trans¬ 
forms 

00 OB 

g(s)= 5 dG (x), tp (s) = — J dR (x) 

0 0 



370 Chap. 11. Elements of Queueing Theory 

Theorem. Under the conditions (I) to (6), the function R(x) 
satisfies the integral equation 

X 

R (;c) = exp [—+ + J er-K^ [1— G(x — z)] dz-{- 

0 

X X— y 

+ (A,+>,,)5 J e-(>-*>->i/-^R(x—y—z)dG(z)dy (1) 

0 0 

In terms of Laplace transforms, the solution of this equation is 
given by the formula 

(X + s)[s+(X + Xi)(l-^(X + s))J 

Proof. The event we are interested in — flawless operation of the 
system during time from 0 to x—is decomposable into three mu¬ 
tually independent events: 

1. During the time (0, x) neither the basic nor the stand-by 
element fails. The probability of this is equal to 

2. The first breakdown occurs prior to time x. The remaining 
element operates flawlessly up to time x. Repair of the element 
which has failed is not completed prior to time x. The probability 
of this event is equal to 

X 

^ [l^G (jc— z)] dz 

0 

3. The first breakdown occurs prior to time x, the repair of 
this element is completed also prior to time x, during the repair 
period, the remaining element was functional. From the time of 
repair to time x, the system functioned normally. The probability 
of this event is equal to 

J e-^ R(x—y-—z)dydG (z) 

y+2 <X 

Equating R {x) to the sum of the three enumerated probabilities 
yields Equation (1). 

Employment of the most elementary properties of Laplace trans¬ 
forms converts (1) into the equation 

•P(s) = S+*(X+X0 [i^— 

from which (2) follows immediately. 

It will be noted that by virtue of the properties of the expo¬ 
nential distribution, the result obtained is immediately extended 
to the case when there are n operating devices and one stand-by 



Sec. 64. Elements of the Theory of Stand-by Systems 


371 


unit. All devices have the same properties, that is, they have the 
same distribution functions for operating time and repairs. It is 
merely necessary to replace K by nX in the formulas (1) and (2). 

It is easy to calculate that the expectation of the time of flaw¬ 
less operation of the system is equal to 


a 


_ ' d<p (s)' _ 

~ I ds J s=o~ 


X (1 —g (A.)) 


In particular, for a nonloaded stand-by system, we have 

2-giX) 




and for a loaded stand-by system 

3-2g(;.) 

‘ 21 {I-giX)) 




(3) 


(3' 


(3") 


In the cases that are of most practical interest, the mean dura¬ 
tion of repairs is considerably less than the mean time of flawless 
operation of the device. In order to invest with precise meaning 
the results that may be detected here, we prove the following limit 
theorems. 

Suppose that the function G(a:) depends on a certain parameter v 
and for any e > 0, as v —oo, 

1-G,(8)-^0 (4) 


It is easy to see that the following relation immediately follows 
from (3): 

i^) — 1 (5) 

as V tends to infinity. 

The converse is also true: if for any s > 0 and as v tends to 
infinity we have the relation gv(s)—►!, then for any x > 0, as v 
tends to infinity, G^ (x) tends to one. 


Theorem. If condition (4) holds, then the flow of failures of a 
reduplicated system [conditions (1) to (6) are also assumed to hold] 
tends to the elementary case, given the choice of a proper unit 
of time. 

Proof. Put 

1 — S') W) 


Then, by virtue of (2), 


(Pv (a,s) = 


0 2 i—gv(^+M 




372 


Chap. 11. Elements of Queueing Theory 


Note that 


gv(>»+M 


and that for s > 0 

eo OB 

+ = 5 €‘^^(1 —a”“vJ^®)dGv (xXsa^ J xe~^^dG^ {x) 

0 0 

But by virtue of (4) 

OB e 00 

t xe~^*dG^ (<^)“ J xe^^dG., W+ J xe’"^^ dG^ 

0 0 8 

c» 

^ e + max J dG, (x)^b-\--^ = Ae 

s 

Thus, in any finite interval s 

,7) 

uniformly in s. 

Now from (6) and (7) it follows that as v tends to infinity uni¬ 
formly in s 

«P.(avS) — 


By the familiar theorems on Laplace transforms this means that 

the distribution of the random variable where y, denotes the 

length of the interval between two successive failures of the system 
on the condition that the repair-time function is Gv(x), converges 
to the distribution 1—as asserted by the theorem. 

To estimate the effect of repair on the operational effectiveness 
of a system it is natural to consider the ratio of the mean opera¬ 
tional time of a system with repair to that without repair. The 
former is calculated from the formula (3), the latter from the formula 

2X-|-Aij 

The effectiveness of repairs is 

- _ A.-|-(X+A>l)(l -gv (^)) /Q\ 

i2X+X^)i\~g,(X)) 

Let us now determine what effect the choice of the function G^ (x) 
has on the value of In this ease, we shall naturally take all 



Sec. 64. Elements of the Theory of Stand-by Systems 


373 


G^x) participating in the comparison to have the same expectation, 
which is assumed to be equal to For this purpose, consider 
the following distribution functions: 




^ 0 

for 

1. 

G,(*) = < 

1 

2 

for Q<x^- 



1 

L 

% 

for X > ~ 

V 



f Q 

for 

n. 

G, (x) = -j 


for a: > 0 



' 0 

for 

in. 

G, (a:) = < 

V 

for 0 < X ^ 

V 



1 

for X > — 




V 



f ^ 

for x^O 

IV. 

G, (x) = • 

X 

for X > 0 



^ 0 

for x<-i 



f 

0 

V. 

G, {x) =• 

1 

V 

for X > -^ 

V 


We confine ourselves to the case of nonloaded stand-by relief 
and in Table 13 give all calculations dealing with the effectiveness 
of repair for the enumerated distributions. 

TABLE 13 


G^{x) 


vA=i 

2 

4 

10 

I 

II 

III 

IV 

V 

i+e- 

2(2— 

V 

‘+2X 

v(l_e-*Vv) 

‘■^2l2\-v(l-e-*Wv)] 

1 1 < 3 '’)’ 

‘ 2[(X-f3v)3—(3v)®] 

g-Vv 

14- ^ 

2(1— 

1.66 

1.50 

1.^ 

1.36 

1.29 

2.08 

2.00 

1.86 

1.85 

1.77 

3.04 

3.00 

2.85 

2.84 

2.76 

6.02 

6.00 

5.84 

5.84 

5.75 











374 


Chap. 11. Elements of Queueing Theory 


The above table gives an amazingly small spread of the effective¬ 
ness of repair for such utterly different distributions of repair periods 
that we chose. The somewhat greater effectiveness for the first two 
distributions is due to the fact that they have an appreciable possibi¬ 
lity of repair within short periods of time. The fact that the last dis¬ 
tribution requires one and the same time for any repair reduces the 
effectiveness somewhat. The fact that the figures given in the table 
are so close to each other follows from the theorem we are about to 
prove. 

Suppose that 


miiv) = ^xdG,{x) = ^, 
0 


mj(v) 



dG. (x) < + 


OO 


and, as v tends to infinity, 


(V) 

mi (V) 


0 


(9) 


Theorem. If the condition (9) is satisfied in addition to the con¬ 
ditions (1) to (6), then for large values of v the ntean time of 
flawless operation of a system with stand-by relief is asymptotically 
equal to the mean time of the system under the assumption that 
Q^(x)=\—e-^^. 

Proof. Since for any > 0 
it follows that 

j I 1 + UI dG, W < 

0 

It will be noted that 


GO 


OO 


1 —Sv W = 5 ^ dG., (a;) — J — 1 -\-Kx) dG^ (x) 

0 0 


By virtue of (9) 

l_g,(X)=X/n, (v) [1+ 0 ( 1 )] = A(1+ 0 ( 1 )) = ^ (1+0(1)) 


Substituting this estimate into (8), we find 




2X -j- Xi V 

A,(X-|-v) 


(1 -f o(l)) 


( 10 ) 



Sec, 64. Elements of the Theory of Stand-by Systems 


375 


A simple calculation shows that for the distribution G^(a:)= 1— 

^ 2X — j“ Xj -j- V 

X(X+v) 

A comparison of (10) and (11) proves the theorem. 

Note that the condition (9) is automatically satisfied for all 
distributions with finite variance for which the following equation 
is valid: 

G, (x) = Gi (vx) 

This relation holds for many distributions of practical importance, 
such as the Weibull distribution: 

G(;c)=l—(A.>0, a>0) 


the gamma distribution: 

G' (x) = cx^e~^^ (a >— 1, P > 0) 

and others. 



Appendix 



0.3989 

3970 

3910 

3814 

3683 

3521 

3332 

3123 

2897 

2661 

0.2420 

2179 

1942 

1714 

1497 

1295 

1109 

0940 

0790 

0656 

0.0540 

0440 

0355 

0283 

0224 

0175 

0136 

0104 

0079 

0060 

0.0044 

0033 

0024 

0017 

0012 

0009 

0006 

0C04 

0003 

0002 


3989 

3965 

3902 

3802 

3668 

3503 

3312 

3101 

2874 

2637 

2396 

2155 

1919 

1691 

1476 

1276 

1092 

0925 

0775 

0644 

0529 

0431 

0347 

0277 

0219 

0171 

0132 

0101 

0077 

0058 

0043 

0032 

0023 

0017 

0012 

0008 

0006 

0004 

0003 

0002 


3989 

3961 

3894 

3790 

3653 

3485 

3292 

3079 

2850 

2613 

2371 

2131 

1895 

1669 

1456 

1257 

1074 

0909 

0761 

0632 

0519 

0422 

0339 

0270 

0213 

0167 

0129 

0099 

0075 

0056 

0042 

0031 

0022 

0016 

0012 

0008 

0006 

0004 

0003 

0002 


3988 

3956 

3885 

3778 

3637 

3467 

3271 

3056 

2827 

2589 

2347 

2107 

1872 

1647 

1435 

1238 

1057 

0893 

0748 

0620 

0508 

0413 

0332 

0264 

0208 

0163 

0126 

0096 

0073 

0055 

0040 

0030 

0022 

0016 

0011 

0008 

0005 

0004 

0003 

0002 


3986 

3951 

3876 

3765 

3621 

3448 

3251 

3034 

2803 

2565 

2323 

2083 

1849 

1626 

1415 

1219 

1040 

0878 

0734 

0608 

0498 

0404 

0325 

0258 

0203 

0158 

0122 

0093 

0071 

0053 

0039 

0029 

0021 

0015 

0011 

0008 

0005 

0004 

0003 

0002 


3984 

3945 

3867 

3752 

3605 

3429 

3230 

3011 

2780 

2541 

2299 

2059 

1826 

1604 

1394 

1200 

1023 

0863 

0721 

0596 

0488 

0396 

0317 

0252 

0198 

0154 

0119 

0091 

0069 

0051 

0038 

0028 

0020 

0015 

0010 

0007 

0005 

0004 

0002 

0002 


3982 

3939 

3857 

3739 

3589 

3410 

3209 

2989 

2756 

2516 

2275 

2036 

1804 

1582 

1374 

1182 

1006 

0848 

0707 

0584 

0478 

0387 

0310 

0246 

0194 

0151 

0116 

0088 

0067 

0050 

0037 

0027 

0020 

0014 

0010 

0007 

0005 

0003 

0002 

0002 


3980 

3932 

3847 

3726 

3572 

3391 

3187 

2966 

2732 

2492 

2251 

20i2 

1781 

1561 

1354 

1163 

0989 

0833 

0694 

0573 

0468 

0379 

0303 

0241 

0189 

0147 

0113 

0086 

0065 

0048 

0036 

0026 

0019 

0014 

0010 

0007 

0005 

0003 

0002 

0002 


3977 

3925 

3836 

3712 

3555 

3372 

3166 

2943 

2709 

2468 

2227 

1989 

1758 

1539 

1334 

1145 

0973 

0818 

068! 

0562 

0459 

0371 

0297 

0235 

0184 

0143 

0110 

0084 

0063 

0047 

0035 

0025 

0018 

0013 

0009 

0007 

0005 

0003 

0002 

0001 


3973 

3918 

3825 

3697 

3538 

3352 

3144 

2920 

2685 

2444 

2203 

1965 

1736 

1518 

1315 

1127 

0957 

0804 

0669 


0449 

0363 

0290 

0229 

0180 

0139 

0107 

0081 

0061 

0046 

0034 

0025 

0018 

0013 

0009 

0006 

0004 

0003 

0002 

0001 












Appendix 


377 


X _ 

•TABLE A.2. Values of ® (x)= —!= f e “ rfa 

K2» J 

0 


X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.0 

0.00000 

00399 

00798 

01197 

01595 

01994 

02392 

02790 

03188 

03586 

0.1 

03983 

04380 

04776 

05172 

05567 

05962 

06356 

06749 

07142 

07535 

0.2 

07926 

08317 

08706 

09095 

09483 

09871 

10257 

10642 

11026 

11409 

0.3 

11791 

12172 

12552 

12930 

13307 

13683 

14058 

14431 

14803 

15173 

0.4 

15542 

15910 

16276 

16640 

17003 

17364 

17724 

18082 

18439 

18793 

0.5 

19146 

19497 

19847 

20194 

20540 

20884 

21226 

21566 

21904 

22240 

0.6 

22575 

22907 

23237 

23565 

23891 

24215 

24537 

24857 

25175 

25490 

0.7 

25804 

26115 

26424 

26730 

27035 

27337 

27637 

27935 

28230 

28524 

0.8 

28814 

29103 

29389 

29673 

29955 

30234 

30511 

30785 

31057 

31327 

0.9 

31594 

31859 

32121 

32381 

32639 

32894 

33147 

33398 

33646 

33891 

!.0 

34134 

34375 

34614 

34850 

35083 

35314 

35543 

35769 

35993 

36214 

1.1 

36433 

36650 

36864 

37076 

37286 

37493 

37698 

37900 

38100 

38298 

1.2 

38493 

38686 

38877 

39065 

39251 

39435 

39617 

39796 

39973 

40147 

1.3 

40320 

40490 

40658 

40824 

40988 

41149 

41309 

41466 

41621 

41774 

1.4 

41924 

42073 

42220 

42364 

42507 

42647 

42786 

42922 

43056 

43189 

1.5 

43319 

43448 

43574 

43699 

43822 

43943 

44062 

44179 

44295 

44408 

1.6 

44520 

44630 

44738 

44845 

44950 

45053 

45154 

45254 

45352 

45449 

1%7 

45543 

45637 

45728 

45818 

45907 

.45994 

46080 

46164 

46246 

46327 

1.8 

46407 

46485 

46562 

46638 

46712 

46784 

46856 

46926 

46995 

47062 

1.9 

47128 

47193 

47257 

47320 

47381 

47441 

47500 

47558 

47615 

47670 

2.0 

47725 

47778 

47831 

47882 

47932 

47982 

48030 

48077 

48124 

48169 

2.1 

48214 

48257 

48300 

48341 

48382 

48422 

48461 

48500 

48537 

48574 

2.2 

48610 

48645 

48679 

48713 

48745 

48778 

48809 

48840 

48870 

48899 

2.3 

48928 

48956 

48983 

49010 

49036 

49061 

49086 

49111 

49134 

49158 

2.4 

49180 

49202 

49224 

49245 

49266 

49286 

49305 

49324 

49343 

49361 

2.5 

49379 

49396 

49413 

49430 

49446 

49461 

49477 

49492 

49506 

49520 

2.6 

49534 

49547 

49560 

49573 

49585 

49598 

49609 

49621 

49632 

49643 

2.7 

49653 

49664 

49674 

49683 

49693 

49702 

49711 

49720 

49728 

49736 

2.8 

49744 

49752 

49760 

49767 

49774 

49781 

49788 

49795 

49801 

49807 

2.9 

49813 

49819 

49825 

49831 

49836 

49841 

49846 

49851 

49856 

49861 



0.49865 

3.1 

49903 

3.2 

49931 

3.3 

49952 

3.4 

49966 


49977 

499968 

499997 

49999997 

3.6 

49984 

3.7 

■ 

3.8 

49993 

3.9 

49995 


5.0 

























378 


Appendix 


TABLE A.3. Values of Pj^{a)'=- 


a^e-^ 

kl 


0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.904837 

0.818731 

0.740818 

0.670320 

0.606531 

0.548812 

0.090484 

0.163746 

0.222245 

0.268128 

0.303265 

0.329287 

0.004524 

0.016375 

0.033337 

0.053626 

0.075816 

0.098786 

0.000151 

0.001091 

0.003334 

0.007150 

0.012636 

0.019757 

0.000004 

0.000055 

0.000002 

0.000250 

0.000015 

0.000001 

0.000715 

0.000057 

0.000004 

0.001580 

0.000158 

0.000013 

0.000001 

0.002964 

0.000356 

0.000035 

0.000003 


0.7 


0.8 


0.9 


1.0 


2.0 


3.0 


0.496585 0.449329 

0.347610 0.359463 

0.121663 0.143785 

0.028388 0.038343 

0.004968 0.007669 

0.000695 0.001227 

0.000081 0.000164 

0.000008 0.000019 

0.000002 


0.406570 0.367879 

0.365913 0.367879 

0.164661 0.183940 

0.049398 0.061313 

0.011115 0.015328 
0.002001 0,003066 

0.000300 0.000511 

0.000039 0.000073 

0.000004 0.000009 

0.000001 


0.135335 0.049787 
0.270671 0.149361 

0.270671 0.224042 

0.180447 0.224042 
0.090224 0.168031 
0.036089 0.100819 
0.012030 0.050409 
0.003437 0.021604 
0.000859 0.008101 
0.000191 0.002701 

0.000038 0.000810 
0.000007 0.000221 
0.000001 0.000055 
0.000013 
0.000003 
0.000001 
















Appendix 


379 


TABLE A.3 (continued) 


N . a 

k 

4.0 

5.0 

6.0 

7.0 

8.0 

9.0 

0 

0.018316 

0.006738 

0.002479 

0.000912 

0.000335 

0.000123 

1 

0.073263 

0.033690 

0.014873 

0.006383 

0.002684 

0.001111 

2 

0.146525 

0.084224 

0.044618 

0.022341 

0.010735 

0.004998 

3 

0.195367 

0.140374 

0.089235 

0.052129 

0.028626 

0.014994 

4 

0.195367 

0 . 1754.67 

0.133853 

0.091226 

0.057252 

0.033737 

5 

0.156293 

0.175467 

0.160623 

0.127717 

0.091604 

0.060727 

6 

0.104194 

0.146223 

0.160623 

0.149003 

0.122138 

0.091090 

7 

0.059540 

0.104445 

0.137677 

0.149003 

0.139587 

0.117116 

8 

0.029770 

0.065278 

0.103258 

0.130377 

0.139587 

0.131756 

9 

0.013231 

0.036266 

0.068838 

0.101405 

0.124077 

0.131756 

10 

0.005292 

0.018133 

0.041303 

0.070983 

0.099262 

0.118580 

11 

0.001925 

0.008242 

0.022529 

0.045171 

0.072190 

0.097020 

12 

0.000642 

0.003434 

0.011262 

0.026350 

0.048127 

0.072765 

13 

0.000197 

0.001321 

0.005199 

0.014188 

0.029616 

0.050376 

14 

0.000056 

0.000472 

0.002228 

0.007094 

0.016924 

0.032384 

15 

0.000015 

0.000157 

0.000891 

0.003311 

0.009026 

0.019431 

16 

0.000004 

0.000049 

0.000334 

0.001448 

0.004513 

0.010930 

17 

0.000001 

0.000014 

0.000118 

0.000596 

0.002124 

0.005786 

18 


0.000004 

0.000039 

0.000232 

0.000944 

0.002893 

19 


0.000001 

0.000012 

0.000085 

0.000397 

0.001370 

20 



0.000004 

0.000030 

0.000159 

0.000617 

21 



0.000001 

0.000010 

0.000061 

0.000264 

22 




0.000003 

0.000022 

0.000108 

23 




0.000001 

0.000008 

0.000042 

24 





0.000003 

0.000016 

25 





0.000001 

0.000006 

26 






0.000002 

27 






0.000001 


25 * 


Appendix 


___ QjflO'-O. 

TABLE A . 4 , Values of — 

w=0 


0 . I 

0.2 

0.3 

0.4 

0.5 

0.6 

0.904837 

0.818731 

0.740818 

0.670320 

0.606531 

0.548812 

0 . 995.321 

0.982477 

0.963063 

0.938448 

0.909796 

0.878099 

0.999845 

0.998852 

0.996400 

0.992074 

0.985612 

0.976885 

0.999996 

0.999943 

0.999734 

0.999224 

0.998248 

0.996642 

I .000000 

0.999998 


0.999939 


0.999606 

1.000000 

1.000000 

0.999999 

0.999996 

0.999986 

0.999962 

1.000000 

1.000000 

1.000000 

I .000000 

0.999999 

0.999997 

1.000000 

1.000000 

1.000000 

1.000000 

1.000000 

1.000000 


0.7 


0.8 


0,9 


1.0 


2.0 


3.0 


0.496585 

0.844195 

0.965858 

0.994246 

0.999214 

0.999909 

0.999990 

0.999998 

1.000000 


0.449329 

0.808792 

0.952577 

0.990920 

0.998589 

0.999816 

0.999980 

0.999999 

1.000000 


0.406570 

0.772483 

0.937144 

0.986542 

0.997657 

0.999658 

0.999958 

0.999997 

1.000000 


0.367879 

0.735759 

0.919699 

0 . 98 I 0 I 2 

0.996340 

0.999406 

0.999917 

0.999990 

0.999999 

1.000000 


0.135335 
0.406006 
0.676677 
0.857124 
0.947348 
0.983437 
0.995467 
0,998904 
0.999763 
0.999954 
0.999992 
0.999999 
I.000000 


0.049787 
0.199148 
0.423190 
0.647232 
0.815263 
0.916082 
0.966491 
0.988095 
0.996196 
0.998897 
0.999707 
0.999928 
0.999983 
0.999996 
0.999999 
1.000000 


























Appendix 


381 


TABLE A .4 { continued ) 


H 

4.0 

■ 

6.0 

7.0 

8.0 

0 

0.018316 

0.006738 

0.002479 

0.000912 

0.000335 

1 

0.091579 

0.040428 

0.017352 

0.007295 

0.003019 

2 

0.238105 

0.124652 

0.061970 

0.029636 

0.013754 

3 

0-433472 

0.265026 

0.151205 

0.081765 

0.042380 

4 

0.628839 

0.440493 

0.285058 

0.172991 

0.099632 

5 

0.785132 

0.615960 

0.445681 

0.300708 

0.191236 

6 

0.889326 

0.762183 

0.606304 

0.449711 

0.313374 

7 

0.948866 

0.866628 

0.743981 

0.598714 

0.452961 

8 

0.978636 

0.931806 

0.847239 

0.729091 

0.592548 

9 

0.991867 

0.968172 

0.916077 

0.830496 


10 

0.997159 

0.986305 

0.957380 

0.901479 

0.815887 

11 


0.994547 

0.979909 

0.946650 

0.888077 

12 

0.999726 

0.997981 

0.991173 

0.973000 

0.936204 

13 

0.999923 

0.999202 

0.996372 

0.987188 

0.965820 

14 

0.999979 

0.999774 



0.982744 

15 

0.999994 

0.999931 

0.999491 

0.997593 

0.991770 

16 

0.999998 

0.999980 

0.999825 

0.999041 

0.996283 

17 

0.999999 

0.999994 

0.999943 

0.999637 

0.998407 

18 

0.999999 

0.999998 

0.999982 

0.999869 

0.999351 

19 

0.999999 

0.999999 

0.999994 

0.999955 

0.999748 

20 

1.000000 

0.999999 

0.999998 

0.999985 

0.999907 

21 


1.000000 

0.999999 

0.999995 

0.999967 

22 



0.999999 

0.999998 

0.999989 

23 



1.000000 

0.999999 

0.999997 

24 




0.999999 

0.999999 

25 




1.000000 

0.999999 

26 





1.000000 

27 






28 







9.0 


0.000123 

0.001234 

0.006232 

0.021228 

0.054963 

0,115690 

0.206780 

0.323896 

0.455652 

0.587408 

0.705988 

0.803008 

0.875773 

0.926149 

0.958533 

0.977964 

0.988894 

0.994680 

0.997573 

0.998943 

0.999560 

0.999824 

0.999932 

0.999974 

0.999990 

0.999996 

0.999998 

0.999999 

1.000000 








Bibliography 

(STARRED ITEMS ARE IN RUSSIAN) 


POPULAR 

1. Borel, E., Le hasard, Paris, 2nd ed., 1948. 

2. Borel, E., Probability and Certainty, New York, 1963. 

3. * Gnedenko, B. V., How Mathematics Studies Random Phenomena, Izd. Akad. 

Nauk Ukr. S.S.R., Kiev, 1947. 

4. * Gnedenko, B. V. and Khinchin, A. Ya., An Elementary Introduction to Pro¬ 

bability Theory, 6th ed., Izd. “Nauka”, 1964. 

5. * Yaglom, A. M. and Yaglom, I. M., Probability and Information, 2nd ed., 

Fizmatgiz, 1960. 


TEXTBOOKS AND MONOGRAPHS 

6. Bartlett, M. S., An Introduction to Stochastic Processes, Cambridge, 1955. 

7. * Bernstein, S. N., Probability Theory, 4th ed., Gostekhizdat, 1946. 

8. Blackwell, D. and Girshick, M. A., Theory of Games and Statistical Decisions, 
New York, 1954. 

9. Blanc-Lapierre, A. et Fortet, R., Theorie des fonctions aleatoires, Paris, 1953. 

10. Ghandrasekar, S., Stochastic Problems in Physics and Astronomy, Rev. Modern 
Phys., Vol. 15, 1943. 

11. Chung, Kai-Lai, Markov Chains with Stationary Transition Probabilities, 
Springer, Berlin, 1960. 

12. Cramer, H., Random Variables and Probability Distributions, Cambridge 
University Press, 2nd ed., 1961. 

13. Doob, J. L., Stochastic Processes, New York, 1953, 

14. * Dunin-Barkovsky, I. V. and Smirnov, N. V., The Theory of Probability and 

Mathematical Statistics (general part), GTTI, Moscow, 1955. 

15. * Dynkin, E. B., Fundamentals in the Theory of Markov Processes, Fizmatgiz, 

1959. 

16. * Dynkin, E. B., Markov Processes, Fizmatgiz, 1963. 

17. * Einstein and Smoluchowski, Collection of Articles on the Theory of Brow¬ 

nian Motion, ONTI, 1936. 

18. Fisz, M., Rachunek prawdopodobienstwa i statistyka matematyczna, Wars¬ 
zawa, 1958. 



Bibliography 


383 


19. Frechet, M., Recherches tMoriques modernes. Traite du calcul des pr(^abi- 
lites, Paris, 1937, t. I, II. 

20. * Gilenko, N. D., Problems in Probability Theory, Uchpedgiz, 1943. 

21. * Glivenko, V. I., ^4 Course in the Theory of Probability, GONTI, 1939. 

22. * Glivenko, V. I., The Stieltjes Integral, ONTI, 1938. 

23. * Gnedenko, B.-V. and Kolmogorov, A. N., Limit Distributions for Sums of 

Independent Random Variables, Gostekhizdat, 1949. 

24. Grenander, U., Probabilities on Algebraic Structures, New York, 1963. 

25. Hannan, E. J., Time Series Analysis, Mathuen and Co., London, 1960. 

26. Harris, T. E., The Theory of Branching Processes, Springer-Verlag, 1963. 

27. * Ito, Km Stochastic Processes, IL, “Matematika”, Issue 1, 1960; Issue 2, 1963. 

28. * Khinchin, A. Ya., Basic Laws of Probability Theory, GTTI, 1932. 

30. * Khinchin, k. a.. Limit Lows for Sums of Independent Random Variables, 

ONTI, 1938. 

31. * Khinchin, A. Ya., Mathematical Foundations of Statistical Mechanics, Gos¬ 

tekhizdat, 1943. 

32. * Khinchin, A. Ya., Mathematical Foundations of Quantum Statistics, Gos¬ 

tekhizdat, 1951. 

33. * Khinchin, A. Ya., Mathematical Methods in the Study of Queues, Trudy Mat. 

inst. imeni V. A. Steklova, Izd. Akad. Nauk S.S.S.R., 1955. 

34. * Kolmogorov, A. N., Basic Concepts of Probability Theory, ONTI, 1936. 

35. * Kubilius, I., Prbbcibilistic Methods in Number Theory, Vilnius, 1959. 

36. Laning, J. H. and Battin, R. H., Random Processes in Automatic Control, 
New York, 1956. 

37. Levy, P., Theorie de Vaddition des variables aleatoires, Paris, 1937. 

38. Levy, P., Processes stochastiques et mouvement brownien, Paris, 1948. 

39. * Linnik, Yu. V., Decompositions of Probability Distributions, Izd. LGU, 

1960. 

40. Loeve, M., Probability Theory^ Princeton, 3rd ed., 1963. 

41. * Markov, A. A., The Calculus of Probabilities, 4th ed., GIZ, 1924. 

42. * Meshalkin, L. D., Collection of Problems in Probability Theory, Izd. MGU, 

1964. 

43. Mises, R. von, Wahrscheinlichkeitsrechnung, 1931. 

44. Mises, R. von. Probability, Statistics, and Truth, New York, 1939. 

45. Onicescu, O., Mihoc, G., Jonescu Tulcea, C. T., Calculul Probabilatilor ^i 
aplicatii, Bucure§ti, 1956. 

46. Parzen, E., Modern Probability Theory and Its Applications, John Wiley 
and Sons, Inc., New York, 1960. 

47. Renyi, A., Wahrscheinlichkeitsrechnung, Deutsche Verlag der Wissenschaften, 
1962. 

48. Richter, H., Wahrscheinlichkeitsrechnung, Springer-Verlag, 1956. 

49. * Romanovsky, V. I., Discrete Markov Chains, Gostekhizdat, 1949. 

50. Rosenblatt, M., Random Processes, Oxford University Press, N. Y., 1962. 

51. * Rozanov, Yu. k.. Stationary Stochastic Processes, Fizmatgiz, 1963. 

52. Saaty, T., Elements of Queueing Theory with Its Applications, McGraw-Hill 
Book Company, New York, 1961. 

53. * Sarymsakov, T. A., Fundamentals in the Theory of Markov Processes, Gos¬ 

tekhizdat, 1954. 

54. * Sirazhdinov, S. Kh., Limit Theorems for Homogeneous Markov Chains, Izd. 

Akad. Nauk Uz. S.S.R., 1955. 

55. * Skorokhod, A. V., Studies in the Theory of Stochastic Processes, Izd. Kiev. 

Universiteta, 1961. 

56. * Skorokhod, A. V., Stochastic Processes with Independent Increments, Izd. 

“Nauka” 1964. 

57. Todhunter, J.* A History of the Mathematical Theory of Probability, 1865. 
Tortrat, A., Calcul des probabilites, Masson et Cie, Paris, 1963. 

59.* Ventstzel, E. S., Probability Theory, 3rd ed., Izd. “Nauka”, 1964. 



384 


Bibliography 


JOURNALS 


Chapter 1 

60. * Bernstein, S. N., “On the Axiomatic Substantiation of Probability Theory”, 

Reports of Kharkov Math. Society, Vol. 15 (1917). 

61. * Khinchin, A. Ya., “Mises’ Theory of Probability and the Principles of Phy¬ 

sical Statistics”, Uspekhi fizich. nauk, Vol. IX, Issue 2 (1929). 

62. * Khinchin, A. Ya., “The Method of Arbitrary Functions and the Struggle 

Against Idealism in Probability Theory”, in Collection of articles entitled 
Philosophical Problems of Modern Physics, Izd. Akad. Nauk S. S. S. R., 1952. 

63. * Khinchin, A. Ya., “The Frequency Theory of Richard von Mises and Modern 

Ideas in Probability Theory”, Voprosy filosofii. Nos. 1 and 2 (1961). 

64. * Kolmogorov, A. N., “The Role of Russian Science in the Development of 

Probability Theory”, Uchen. zap. MGU, Issue 91 (1947). 

65. Theorie des probabilites. Exposes sur ses fondement et ses applications, Paris, 
Gautier-Vi liars, 1952. 


Chapter 2 

66. * Bernstein, S. N., “Once again on the Question of the Accuracy of the Laplace 

Limit Formula”, Izv. Akad. Nauk S.S.S.R., Vol. 7 (1943). 

67. Feller, W., “On the Normal Approximation to the Binomial Distribution”, 
Ann. Math. Stat.,yol. XVI (1945). 

68. Khinchin, A. Ya., Uber einen neuen Grenzwertsatz der Wahrscheinlichkeit- 
srechnui^”. Math. Annal., Vol. 101 (1929). 

69. * Prokhorov, Yu. V., “Asymptotic Behaviour of the Binomial Distribution”, 

Uspekhi matem. nauk, Vol. 8^ Issue 3, pp. 136-142 (1953). 

70. * Smirnov, N. V., “On the Probabilities of Large Deviations”, Mat. Sbornik, 

Vol. 40, No. 4 (1933). 


Chapter 3 

71. * Dobrushin, R. L., “Limit Theorems for Markov Chains of Two States”, Izv. 

Akad. Nauk S.S.S.R., Ser. mat., 17, pp. 291-330 (1953). 

72. * Dobrushin, R. L., “Central Limit Theorem for Nonhomogeneous Markov 

Chains”, Probability Theory and Its Applications, Vol. 1, Issue 1, pp. 72-89, 
Issue 4, pp. 365-425 (1956). 

73. Doeblin, W., “Expose de la theorie deschaines simples constante de Markoff 
a un nombre fini d’etats”. Rev. maih.de VUnion Interbalkanique, II, I (1938). 

74. * Kolmogorov, A. N., “Markov Chains with Countable Number of Possible 

States”, Bull. MGU, Vol I, Issue 3 (1937). 

75. * Markov, A. A., “Investigation of a Remarkable Case of Dependent Trials”, 

Izv. Ros. Akad. Nauk, Vol. 1 (1907). See also appropriate chapters in the books 
of Bernstein and Feller given in the list of textbooks and monographs; also 
in Romanovsky and Frechet (Vol. 2). The books of Doob and Sarymsakov con¬ 
tain extensive bibliographies on Markov chains. 


Chapter 4 

76. Cramer, H., “Uber eine Eigenschaft der normalen Verteilungsfunction”. Math. 
Zeitschr., Vol. 41 (1936). 

77. * Raikov, D. A., “On the Decomposition of the Laws of Gauss and Poisson”, 

Iw. Akad. Nauk S.S.S.R., Ser. mat., pp. 91-124 (1938). 

78. * Skitovich, V. P., “Linear Forms of Independent Random Variables and the 

Normal Distribution Law”, Izv. Akad. Nauk S.S.S.R., 18 (952) (1954). 



Bibliography 


385 


Chapter 6 

79. * Bernstein, S. N., “On the Law of Large Numbers”, Soobshch. Khark. Mat. 

Obshchestva, Vol. XVI (1918). 

80. * Chebyshev, P. L., “On Mean Quantities”, Mat. Sbornik, Vol. 2 (1867); Q)m' 

plete Works, Vol. 2, 1948. 

81. Hajek, I. and Renyi, A., “Generalization of an Inequality of Kolmogorov”, 
Acta Math. Acad. Sc. Hungarical, t. VI, fasc. 3-4, pp. 281-283 (1955). 

82. * Kolmogorov, A. N., “Sur la loi fort des grands nombres”, C. R. Acad. Sci., 

Paris, 191, 910-912 (1930). 

83. * Prokhorov, Yu. V., “On the Strong Law of Large Numbers”, Doklady Akad. 

Nauk S.S.S.R., Vol. 69, No. 5 (1949). 

84. Slutsky, E. E., “Uber stochastische Asymptoten und Grenzwerte”, Metron 
5 (1925); Selected Works, Izd. Akad. Nauk S.S.S.R., 1960. 

Chapter 7 

85. * Gnedenko, B. V., “On Characteristic Functions”, Bull. MGU, Vol. 1. Issue 

5 (1937). 

86. * Khinchin, A. Ya., “On a Criterion for Characteristic Functions”, Bull. MGU, 

Vol. 1, Issue 5 (1937). 

87. * Krein, M. G., “On the Representation of Functions by Means of Fourier- 

Stieltjes Integrals”, Uchen. zap. Kuibyshevsk. ped. tnsL, .Issue 7 (1943). 

88. * Raikov, D. A., “On Positive Definite Functions”, Doklady Akad. Nauk 

S.S.S.R., Vol., XXVI, No. 9, pp. 857-862 (1940). 

Chapter 8 

89. * Bernstein, S. N., “Extending the Limit Theorem of Probability Theory to 

Sums of Dependent Variables”, Uspekhi mat. nauk. Issue 10 (1944). 

90. * Chebyshev, P. L., “On Two Theorems Concerning Probability”, Zap. Akad. 

Nauk (1887); Complete Works, Vol. 2, 1948. 

91. Esseen, C. G., “Fourier Analysis of Distribution Functions. A Mathematical 
Study of the Laplace-Gaussian Law”, Acta Math., Vol. 77 (1945). 

92. Feller, W., “Uber den Zentralengrenzwertsatz der Wahrscheinlichkeitsrech- 
nung”, Math. Zeitschr., Vol. 40 (1935). 

93. * Gnedenko, B. V., “Elements of the Theory of Distribution Functions of Ran¬ 

dom Vectors”, Uspekhi mat. nauk. Issue 10 (1944). 

94. * Gnedenko, B. V., “On the Local Limit Theorem of Probability Theory”, Us¬ 

pekhi mat. nauk, Vol. Ill, Issue 3 (1948). 

95. * Gnedenko, B. V., “The Local Limit Theorem for Densities”, Doklady Akad. 

Nauk S.S.S.R., Vol. 95, No. 1 (1954), 

96. Lindeberg, J. W., “Eine neue Herleitung des Exponentialgesetz in der Wah- 
rscheinlichkeitsrechnung”. Math. Zeitschr., Vol. 15 (1922). 

97. * Linnik, Yu. V., “On the Accuracy of the Approach of Sums of Independent 

Random Variables to the Gaussian Distribution”,/zu. Akad. Nauk S.S.S.R., 
Vol. 11 (1947). 

98. * Lyapunov, A. M., “Sur une proposition de la theorie des probabilites”. Bull 

Acad. Sc. Peter., 13, (1900). 

99. * Lyapunov, A. M., “Nouvelle forme du theoreme sur la limite des probabilites”, 

ibid. (1901). 

100. * Prokhorov, Yu. V., “Local Theorem for Densities”, Doklady Akad. Nauk 

S.S.S.R., Vol. 83, No. 6 (1952). 

Chapter 9 

101. Bavli, G. M., “Uber einige Verallgemeinerungen der Grenzwertsatz der Wahr- 
scheinlichkeitsrechnung”, Mat. Sbornik, Vol. I (43), No. 6 (1936). 



386 


Bibliography 


102. * Gnedenko, B. V,, “On a Characteristic Property of Infinitely Divisible Dis¬ 

tribution Laws”, Bull. MGU, Vol. I, Issue 5 (1937). 

103. * Gnedenko, B. V. “Limit Laws for Sums of Independent Random Variables”, 

Uspekhi mat. nauk. Issue 10 (1944). 

104. * Khinchin, A. Ya., “A New Derivation of a Formula by P. Levy”, Bull. MGU, 

Vol. I, Issue 1 (1937). 


Chapter 10 

105. Cramer, H., “On Harmonic Analysis in Certain Continuous Functional Spaces”, 

Ark. Mat. Astr. Fys., 28B, No. 12 (1942). 

106. * Dubrovsky, V. M., “Generalizing the Theory of Purely Discontinuous Sto¬ 

chastic Processes”, Doklady Akad. Nauk S.S.S.R., Vol. XIX (1938). 

107. * Dubrovsky, V. M., “An Investigation of Purely Discontinuous Stochastic 

Processes by the Method of Integro-Differential Equations”, Izv. Akad. Nauk 
S.S.S.R., Vol. 8 (1944). 

108. Feller, W., “On the Theory of Stochastic Processes”, Uspekhi mat. nauk, Issue 
5 (1938). 

109. Karhunen, K., “Uber lineare Methoden in der Wahrscheinlichkeitsrechnung”, 
Ann. Acad. Sci. Fennicae, A, I, No. 37, Helsinki (1947). 

110. * Khinchin, A. Ya., “Correlation Theory of Stationary Stochastic Processes”, 

Uspekhi mat. nauk, Issue 5 tl938). 

111. * Kolmogorov, A. N., “Simplified Proof of the Ergodic Theorem of Birkhoff- 

Khinchin”, Uspekhi mat. nauk, Issue 5 (1938). 

112. * Kolmogorov, A. N., “On Analytical Methods in Probability Theory”, Uspekhi 

mat. nauk. Issue 5 (1938). 

113. * Kolmogorov, A. N., “Interpolation and Extrapolation of Stationary Random 

Sequences”, Izv. Akad. Nauk S.S.S.R. (1941). 

114. * Kolmogorov, A. N., “A Statistical Theory of Vibrations with Continuous 

Spectrum”, Jubil. Sbornik Akad. Nauk S.S.B.R., Part I (1947). 

115. * Kolmogorov, A. N. and Dmitriev, N. A., “Branching Stochastic Processes”, 

Doklady Ak^. Nauk S.S.S.R., Vol. 56, No. 1 (1947). 

116. * Kolmogorov, A. N. and Sevastyanoy, B- A., “Computation of Final Prob¬ 

abilities for Branching Stochastic Processes”, Doklady Akad. Nauk S.S.S.R., 
Vol. 56, No. 8 (1947). 

117. Loeve, M., “Sur les fonctions aleatoires stationnaires de second ordre”. Rev. 
Sci., 83, No. 5 (1945). 

118. Loeve, M., “Fonctions aleatoires a decomposition orthogonale exponentielle”. 
Rev. Sci., 84, No. 3 (1946). 

119. Maruyama, G., “The Harmonic Analysis of Stationary Stochastic Processes”, 
Mem. Fac. Sc. Kyusyu Univ., A, 4, No. 1 (1949). 

120. * Rozanov, Yu. A., “The Spectral Theory of Multidimensional Stationary Pro¬ 

cesses with Discrete Time”, Uspekhi mat. nauk. Issue 2 (1958). 

121. * Sevastyanov, B. A., “The Theory of Branching Stochastic Processes”, Us¬ 

pekhi mat. nauk, Vol. 6, Issue 6 (1951). 

122. * Yaglom, A. M., “On the Question of Linear Interpolation of Stationary Sto¬ 

chastic Sequences and Processes”, Uspekhi mat. nauk, Vol. IV, Issue 4 (1949). 

123. * Yaglom, A. M., “Introduction to the Theory of Stationary Stochastic Fun¬ 

ctions”, Uspekhi mat. nauk, Vol. VII, Issue 5 (1952). 


Chapter 11 

124.* Belyaev, Yu. K., “Line Markov Processes and Their Application to Problems 
in Reliability Theory”, Transactions of the 6th All-Union Conference on 
Probability Theory and Mathematical Statistics, Vilnius, 1962, pp. 309-323. 



Bibliography 


387 


125. * Belyaev, Yu. K., Gnedenko, B. V., and Kovalenko, I. N., “Basic Trends 

in Queueing Theory”, Transactions of the 6th All-Union Conference on Pro¬ 
bability Theory and Mathematical Statistics, Vilnius, 1962, pp. 341-355. 

126. * Gnedenko, B. V., “On Non-loaded Duplication”, r^c/imca/ Cybernetics, No. 4, 

pp. 3-12 (1964). 

127. * Gnedenko, B. V., “On Duplication with Renewal”, Technical Cybernetics, 

No. 5, pp. 111-118 (1964). 

128. * Grigelionis, B., “On the Convergence of Sums of Step Stochastic Processes 

to a Poisson Process”, Probability Theory and Its Applications, Vol. 8, Issue 2, 
pp. 189-194 (1963). 

129. Kendall, D., “Stochastic Processes Occurring in the Theory of Queues and Their 

Analysis by the Method of Imbedded Markov Chains”, A Collection of Tran¬ 
slations “Matematika”, Vol. 3, No. 6, pp. 97-111, 1959. 

130. Kendall, D. G., “Some Recent Works and Further Problems in the Theory 
of Queues”, Probability Theory and Its Applications, Vol. 9, Issue 1, pp. 3-15, 
1964. 

131. * Khinchin, A. Ya., “The Mathematical Theory of a Stationary Queue”, Mat. 

Sbornik, Vol. 39, No. 4, 73-84 (1932); see also Khinchin, A. Ya., Studies in 
Queueing Theory, Fizmatgiz, 1963. 

132. Kiefer and Wolfowitz, “On the Theory of Queues with Many Servers”, Trans. 
Am. Math. Soc., 78, 1-18 (1955). 

133. * Klimov, G. P., “Extremal Problems in the Theory of Queues”, Sbornik Cr/- 

bernetics at the Service of Communism, Vol. 2, Izd. “Energiya”, pp. 310-325, 
1964. 

134. * Kovalenko, I. N., “Some Problems of Queueing Theory with Restrictions”, 

Probability Theory and Its Applications, Vol. 6, No. 1, pp. 222-228, 1961. 

135. * Kovalenko, I. N., “Certain Analytical Methods in Queueing Theory”, Sbornik 

Cybernetics at the Service of Communism, Vol. 2, Izd. “Energiya”, pp. 325-338, 
1964. 

136. Lindley, “The Theory of Queues with a Single Server”, Proc. Cambridge Phil. 
Soc., Vol. 48, 277-289 (1952). 

137. Maryanovich, T. P., “Reliability of Systems with Loaded Reserve”, Doklady 
Akad. Nauk Ukr. S.S.R., No. 8, 964-967 (1961) (In Ukrainian). 

138. Miller, R. G., “Priority Queues”, Ann. Math. Stat., Vol. 31, No. 1, 86-106 
(1960). 

139. * Ososkov, G. A., “A Limit Theorem for Flows of Homogeneous Events”, Pro¬ 

bability Theory and Its Applications, Vol. 1, No. 2, pp. 274-282, 1956. 

140. * Sevastyanov, B. A., “An Ergodic Theorem for Markov Processes and Its 

Application to Telephone Systems with Refusals”, Probability Theory and Its 
Applications, Vol. 2, Issue 1, 1957. 

141. * Shakhbazov, A. A. and Samandarov, E. G., “On Servicing a Non-ordinary 

Flow”, Sbornik Cybernetics at the Service of Communism, Vol. 2, Izd. “Ener¬ 
giya”, pp. 338-353, 1964. 

142. Smith, W. L., “Renewal Theory and Its Ramifications”, J. Roy. Stat. Soc., 
Ser. B., Vol. 20, 1958. 

143. * Solovyev, A. D., “On Stand-by Systems Without Renewal”, Sbornik Cyber¬ 

netics at the Service of Communism, Vol. 2, 83-122, Izd. “Energiya”, 1964. 

144. Takacs, L., “Some Probability Problems in Telephone Traffic”, Acta Math. 
Acad. Sci. Hung., Vol. 8 (1957) (In Hungarian). 



SUBJECT INDEX 


AftereSect, absence of 291, 294, 296, 
297 

Algebra of events, a-46 
Ars conjectandi 206 
Averages, spatial 338 
Axiom of addition 47 

extended 48, 49, 50, 129 
Axiom of continuity 49, 50 
Axioms, definition of 45 
Kolmogorov’s 48 
Axioms of probability 47-51 
incompleteness of 48 


Banach’s match box problem 105 
Bayes’s theorem 57, 301 
Bernoulli’s formula 288 
Bernstein’s theorem 218 
Bertrand’s paradox 33, 44 
Binomial distribution 72 
Birkhofi-Khinchin ergodic theorem 338 
Birth and death processes 346, 350 
Boltzmann statistics 28 
Borel field 46, 47 
Bose-Einstein statistics 28 
Brownian motion 101, 106, 288, 297, 
318, 323 

Buffon’s needle problem 35, 38 


Canonical representation (of normal 
law and Poisson law) 274 
Chain, Markov {see Markov chain) 
205 291 302 

Coefficient, correlation 174, 175, 323 
diffusion 289 

Collectives (von Mises) 43 
Condition, compatibility 324 

Lindeberg’s 253, 254, 255, 258, 
oc;q 284 

Lyapunov’s 259, 267 
Markov 205 


self-compatibility 326 
synimetry 324 
Constant, Euler’s 350 
Convergence in measure 210 
Convergence in probability 210 
Convergence of a sequence of random 
variables 209-212 

Convergence to the normal and Pois¬ 
son laws, conditions for 282 
Convolution 274 
Correlation theory 326 
Covariance 174 
Cumulants 191, 222 
Curve, Cantor 133 


Deciles 191 

Decomposition of stationary proces¬ 
ses, spectral 331, 334 
Degree of certainty of observer 16, 18 
Density of probability distribution 132 
Deviation, standard 180 
Die 20 

Difference (of events) 19 
Diffusion of gases 319 
Dispersion 169 
technical 300 

Distribution, Bernoulli 260 
Boltzmann 28 
chi-square (x^) 148, 266 
function of probabilities 125 
Distribution, function of a random 
variable 125, 127, 129, 130 
gamma 375 

Laplace 192, 285, 286 
lattice 260 
Maxwell 192 

n-dimensional normal 247 
nondegenerate (proper) n-dimen¬ 
sional normal 247 
Pascal 191 



Subject Index 


389 


Poisson 98, 260, 364, 365 
Polya 191 
Student’s 152 
Weibuil 375 


Ellipses of equal probabilities 139 
Encounter problem 46 
Equation, Fokker-Planck 290 

generalized Markov 302, 303, 304, 
307, 313 

Kolmogorov’s first 304 
Kolmogorov’s second 306 
Markov 313 

Equations, Kolmogorov’s 303-311 
Kolmogorov-Feller 311, 312, 318 
Essai philosophique sur les probabi- 
lites (Laplace) 39 
Event, certain 13, 20, 21, 47 

decomposable into particular 
events 21 
elementary 59 
impossible 13, 20, 21, 47 
random 12, 21, 45, 46 
laws of 22 
simple 61 
sure 12, 21 

Events, collectively dependent 55 
collectively independent 54 
complementary 47 
complete group of 21 
complete group of pairwise mutu¬ 
ally exclusive 21 
contrary 20, 47 
elementary 21, 45 
equivalent 19 
field of 21, 22, 46 
field of, Borel 46 
mutually exclusive 20, 47 
simple 21, 22, 45 

Existence of limiting values (von Mi- 
ses) 43 

Expectation, mathematical 164-169, 
176, 190 

theorems on 176-182 
Expectation, mathematical, defined in 
the axiomatics of Kolmogorov 182- 
185 

Expectation of a constant 176 
Expectation of a product 178 
Expectation of a sum 176 
Expectation, sign of 179 


Flow, elementary 341 
Formula, Bayes* 298, 301 
inversion 224, 227, 229 


Kolmogorov’s 285 
Stirling’s 75 
Taylor’s 305, 307 
of total probability 55, 299 
Formulas, Erlang’s 345, 352 
Formulas of Bayes 57 
Frequency and probability 38 
Function, Borel 128 

conditional density 300 
conditional distribution 134 
correlation 326, 329 
distribution 124, 125, 127, 129, 
130, 189, 190 

distribution (of a quotient) 150 
distribution (of a random vari¬ 
able) 125 

distribution (of a sum) 143 
distribution, n-dimensional (of a 
random vector) 134 
probability density 132, 138 
Functions, characteristic 219-250 
characteristic (of multidimensio¬ 
nal random variables) 243-248 
conditional distribution 298 
jump 322 

multidimensional distribution 
134-142 

positive definite 239-242 
of random variables 142-155 


Games of chance 7 
Geometry, non-Euclidean 8 


Implication 18 
Independence of events 53 
Inequality, Bunyakovsky-Cauchy 171 
Cauchy-Bunyakovsky-Schwarz 327 
Inequality, Chebyshev’s 198, 199, 203, 
204, 205, 212, 213, 327 
Kolmogorov’s 213, 215 
Schwarz 171 

Integral, Lebesgue 169, 182, 183 
Poisson 132 
s-special 335, 336 
Stieltjes 143 

defined 155-160, 169, 273, 299, 
332 

stochastic 331 


Law, associative 22 

of binomial probability distri¬ 
bution 72 
Cauchy’s 166, 171 
c<ommutative 22 



390 


Subject Index 


of the excluded middle 17 
of conservation of matter 13 
distribution 130 

of distribution, normal 167, 172 
distributive 22 
idempotency 22 
of large numbers 92, 195-218 
Chebyshev’s form of 198-206 
mass-scale phenomena and the 
195, 197 

a necessary and sufficient con¬ 
dition for 206 
strong 209-218 

logarithmic normal distribution 
193 

Maxwell 149 

normal 268, 273, 282, 284, 320 
normal distribution 139 

Poisson 98, 126, 167, 178, 228, 
268, 273, 274, 282, 284, 298, 316, 
320 

Simpson distribution 145 
Student’s 153 

two-dimensional normal 175 
Laws, infinitely divisible distributi¬ 
on, canonical representation of 270 
definition of 268 
theory of 267 
Length of an interval 336 
Likelihood, equal 18 
Loss of a call 340 


Markov chain, definition of 107-123, 
181 

Markov chain, simple 107 
Mass-scale operation 13 
Mass-scale phenomenon 10, 11 
Mathematical Methods of Statistics 
(Cramer) 40 
Matrix, covariance 173 

of transition probabilities 108 
transition 108 
Mean 190 

Mechanics, statistical 324 
Median of a distribution 190 
Mode of a distribution 191 
Molecular speeds 319 
Moment about the origin, Kth 185 
Moment, absolute 186, 188, 190 
central 185 
of the Kth order 185 
Moments 185-191 

mixed central (of the second or¬ 
der) 173 
problems of 190 


Mortality tables 63 
Motion, Brownian 101, 103, 288, 297, 
318, 323 


Ordinariness 292, 294, 297 
Outcome, possible 23 


Paradox, Bertrand’s 33, 34, 36, 44 
Paradox of de Mere 67 
Period of a state 112 
Points, integral 89 
Probabilistic judgements 14 
Probabilistic regularities 13, 18 
Probability, axiomatic definition of 45 
of causes, Bayes’ rule for 57 
classical definition of 16, 18, 22, 
25 

conception of (von Mises) 43 
conditional 51, 52, 53, 56, 299 
of congestion 342 
definition of 47, 50 
different 'approaches to defini¬ 
tion of 15, 16 
of an event 25 
geometrical 33 

of hypotheses, Bayes’ formulas 
for 58 

mathematical 14, 15, 16, 17, 18, 
42 

statistical 41, 42 
statistical definition of 16, 32 
Probability, theory of 7, 11, 13, 14, 
16, 17 

theory of, axiomatic constructi¬ 
on of 44-66 

theory of, fundamental concept 
of (independence of events) 140 
total, formula of 56 
transition 108 
unconditional 51 

Probability, Statistics and Truth (von 
Mises) 44 

Problem, Banach’s match box 105 
Buff on’s needle 35, 38 
encounter 32, (in production) 33, 
46 

Erlang’s 346 

of limit theorems for sums, sta¬ 
tement of 278 
Process, birth 346, 350 

without aftereffect 291, 302, 312, 
323, 324 

continuous stationary stochastic 
326 



Subject Index 


391 


continuous stochastic 303-311 
death 346 

Markov 290, 291, 343 
normal stochastic 328 
Poisson 291-298, 318, 341, 346 
Poisson (with leading function) 
363 

probabilistic 287, 288 
purely discontinuous stochastic 
311-318, (definition) 311 
random 287 
stationary 290, 291 
stationary stochastic 323 
step 362 

stochastic 287, 288, (definition) 
291, 295, 302 

Processes, birth and death 346-355 
with a discrete spectrum 330 
with independent increments, ho¬ 
mogeneous stochastic 3187323 
distribution function of 320 
Markovian 323 
stationary 324 
theory of 326 
Wiener 318 

Product (of events) 19 

Quantile of order p, distribution 191 

Queueing system, single-server 355- 
361 

Queueing theory 9, 339-375 


Radioactive disintegration 319 
Random, at (explained) 35 and 37 
Random event 11, 12 
Random variable 8, 124, 125, 127 
normally distributed 126 
Randomness (von Mises) 43 
Ranging fire, Bayes’ formula in the 
theory of 58 
Rank of an interval 336 
Realization (of a stochastic process) 
291 

Reliability theory 9 
Relief, loaded 368 
nonloaded 368 
partially loaded 367 
stand-by (without repair) 349 


Scheme, Bernoulli 70, 71, 75, 91, 101, 
126, 181, 216 
Poisson 216 

Semi-invariants 191, 222 
Sequence of random variables, sta¬ 
tionary 330 


Series, Maclaurin 238 
Service system vith a waiting line 
(queue) 352 
Set of conditions 12 
Sets, Borel 49, 127, 131, 136 
Sets, Borel, fields of 49 
Space, ^-dimensional Euclidean 135 
probability 50 
sample 18, 21, 69 
of simple events 21 
Span, distribution 261 
of a distribution 261 
Stability of frequencies 39 
Stand-by systems, theory of 367 
State, essential 111, 112 
transient 111 
unessential 111, 112 
States, communicating 111 
Stationary 291, 294, 337 
Statistics, Boltzmann 28 
Bose-Einstein 28, 29 
Fermi-Dirac 28, 30 
of population 39 

Stochastic processes, theory of 287- 
338 

Stochastic regularities 13 
Sum (of events) 19 

System, single-server (one-channel) 
355 


Theorem, Bayes’ 57 

of addition of probabilities 24 
Bernoulli’s 92, 164, 199, 206, 209 
Theorem, Bernstein’s 218 

Bochner-Khinchin 240, 249, 328 
Borel’s 209, 213, 216 
Chebyshev’s 199, 200, 203, 205 
classical limit 251-266 
converse limit 236, 239 
Cramer’s 148 

DeMoivre-Laplace 71, 116, 251, 
278 

of DeMoivre-Laplace integral 
84, 90, 91 

applications of 91, 120, 239 
of DeMoivre-Laplace, local 75, 78, 
83, 86, 96, 103, 104, 120 
direct limit 235 
ergodic 116 

ergodic (Birkhofi-Khinchin) 334- 
338 

Feller’s 347 
Grigelionis 364 

Helly’s first 231-232, 236, 272, 
276, 277 



392 


Subject Index 


Helly’s generalized second 234, 
235 

Helly’s second 232-234, 236, 272, 
275, 276 

integral limit 84, 88 

Khinchin’s 203, 218, 327, 334 

Khinchin’s (on the correlation 

coefficient) 323 

Kolmogorov’s 215 

Lagrange 301 

Laplace 88, 201 

Lebesgue 230 

limit (for flows) 361 

limit (for infinitely divisible laws) 

275-278 

on limiting probabilities 113 
local 84 
local Laplace 75 
local limit 75 et seq., 259-266 
Lyapunov’s 254-259, 266, 278 
Markov’s 205, 206, 218 
Theorem, multiplication 53, 54, 56, 
296 

Poisson’s 96-101, 201, 285 
Slutsky’s 333 

uniqueness 224, 227, 228, 238 
Theorems, Helly’s 230-235 

limit- (for characteristic functi¬ 
ons) 235-239 
limit (for sums) 279-282 
Theory of errors 7 


Theory of Markov chains 107 
Theory of operators, spectral 334 
Theory of stochastic (probabilistic, 
random) processes 8 
Traffic, incoming 341 
Transformations, Fourier 219 
Transforms, Laplace 369, 370, 372 
Trial 23 

definition of 70 

Trials, independent 69, (definition) 70 
Truncation, method of 203 


Value, principal (of a logarithm) 256 
Variable, lattice random 262 

multidimensional random 134 
w-dimensional random 134 
one-dimensional random 136 
random 125, 126, 127 
Variables, continuous 132 
discrete 131 

uncorrelated random 329 
Variance 169-175, (defined) 169, 173, 
175 

of a constant 179 
of a sum 179, 180 
theorems on 179, 180 
Vector, random 134 

uniformly distributed random 136 
Venn diagram 19, 21 


Printed in the Union of Soviet Socialist Republics 



